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21 Extemal.memo!^„ 
Jeffrey Scott Vitter 

June 2001 ACM Computing Surveys (CSUR), volume 33 issue 2 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: f| pdf(826.46 KB? 




Data sets in large applications are often too massive to fit completely inside the computers 
internal memory. The resulting input/output communication (or I/O) between fast internal 
memory and slower external memory (such as disks) can be a major performance 
bottleneck. In this article we survey the state of the art in the design and analysis of 
external memory (or EM) algorithms and data structures, where the goal is to exploit 
locality in order to reduce the I/O costs. We consider a varie ... 

Keywords: B-tree, I/O, batched, block, disk, dynamic, extendible hashing, external 
memory, hierarchical memory, multidimensional access methods, multilevel memory, 
online, out-of-core, secondary storage, sorting 



cient string matching: an aid to bibliographic search 
red V. Aho, Margaret J. Corasick 
ne 1975 Communications of the ACM, Volume is issue 6 

Full text available: f*pdf(733.7 8 K B) Additional lnformation: MsMBSL ab^Ti!*, references , cffinos, jndex 

terms 

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite 
number of keywords in a string of text. The algorithm consists of constructing a finite state 
pattern matching machine from the keywords and then using the pattern matching machine 
to process the text string in a single pass. Construction of the pattern matching machine 
takes time proportional to the sum of the lengths of the keywords. The number of state 
transitions made by the pattern matching ... 

Keywords: bibliographic search, computational complexity, finite state machines, 
information retrieval, keywords and phrases, string pattern matching, text-editing 
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23 An experimental study of an opportunistic index 
Paolo Ferragina, Giovanni Manzini 

January 2001 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete 
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algorithms 

Full text available* "fH pdft'785 55 KB i Add ' tional Information: Mutation, abstract, references, citings, index 
• j terms 

The size of electronic data is currently growing at a faster rate than computer memory and 
disk storage capacities. For this reason compression appears always as an attractive choice, 
if not mandatory. However space overhead is not the only resource to be optimized when 
managing large data collections; in fact data turn out to be useful only when properly 
indexed to support search operations that efficiently extract the user-requested 
information. 

Approaches to combine c ... 

24 A.gujdedJ^ 
Gonzalo Navarro 

March 2001 ACM Computing Surveys (CSUR), volume 33 issue l 

Full text available: * odfil 19 MB) Additional ,nformation: fu!l citation - references , cjiinss. index 

^ v terms, review 

We survey the current techniques to cope with the problem of string matching that allows 
errors. This is becoming a more and more relevant issue for many fast growing areas such 
as information retrieval and computational biology. We focus on online searching and 
mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its 
history and current developments, and the central ideas of the algorithms and their 
complexities. We present a number of experiments to ... 

Keywords: Levenshtein distance, edit distance, online string matching, text searching 
allowing errors 



25 A new chMacter-bas 
documents 

Ogawa Yasushi, Iwasaki Masajirou 

July 1995 Proceedings of the 18th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: ffi.pdfi8Slt.Qi.KBl Additional Information: MLcitatjon, references, citings, index.terms 



26 Computing curricula 20Q1 

September 2001 Journal on Educational Resources in Computing (JERIC) 

Full text available: ffipdf(61 3,63 KB) AJJ . A . lir , „ t . ^ 

f*S?Y7" iT^A^nT ' Additional Information: mil citation , references, citings, index terms 
ffi htm I (2. 78 KB) 
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matching (extended abstract) 
Roberto Grossi, Jeffrey Scott Vitter 

May 1999 Proceedings of the thirty-second annual ACM symposium on Theory of 
computing 

Full text available: ^.p.dfi1 JJ..MB). Additional Information: Mlcitation, references, cjtinfls, jMex.terms 
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The complexity of searching a sorted array of strings 
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Arne Andersson, Torben Hagerup, Johan Hastad, Ola Petersson 

May 1994 Proceedings of the twenty-sixth annual ACM symposium on Theory of 
computing 

Full text available: *P j pdfl 890.71 KB) Additional Information: full citation, references , citings, index terms 



29 Technjgue^ 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), Volume 24 Issue 4 

Full text available' Wi pdg6 23 MB > Additional Information: full citation, abstract, references , citings, index 
* "'' "'" *' terms, review 

Research aimed at correcting words in text has focused on three progressively more difficult 
problemsr(l) nonword error detection; (2) isolated-word error correction; and (3) context- 
dependent work correction. In response to the first problem, efficient pattern-matching and 
n-gram analysis techniques have been developed for detecting strings that do not appear in 
a given word list. In response to the second problem, a variety of general and application- 
specific spelling cor ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checking, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 



30 Access methods for text 
Chris Faloutsos 

March 1985 ACM Computing Surveys (CSUR), volume 17 issue l 

Full text available: ffipdft2.S9 MB) Additional Information: M cftation, abstract, references, citings, index 
^ ~^ terms, review 

This paper compares text retrieval methods intended for office systems. The operational 
requirements of the office environment are discussed, and retrieval methods from database 
systems and from information retrieval systems are examined. We classify these methods 
and examine the most interesting representatives of each class. Attempts to speed up 
retrieval with special purpose hardware are also presented, and issues such as approximate 
string matching and compression are discussed. A quali ... 

31 East search 

Edleno Silva de Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-Yates 
August 1998 Proceedings of the 21st annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: ■ffi.pdfil, 19 MB) Additional Information: Mutation, references, citings, index terms 



32 Matching and searching analysis for parallel hardware implementation on FPGAs 
Pablo Moisset, Pedro Diniz, Joonseok Park 

February 2001 Proceedings of the 2001 ACM/SIGDA ninth international symposium on 
Field programmable gate arrays 

Full text available: ^pdfH86.03 KB) Additional Information: full citation , abstract, references, index terms 

Matching and searching computations play an important role in the indexing of data. These 
computations are typically encoded in very tight loops with a single index variable and a 
simple search/ matching predicate. Their inherent sequential nature, either because of data 
dependences but more often because of very strong control dependences, makes it 
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impossible to apply existing data dependence and parallelization analysis to exploit 
significant levels parallelism on traditional architecture ... 

33 Jnforma 
W3QS system 

David Konopnicki, Oded Shmueli 

December 1998 ACM Transactions on Database Systems (TODS), volume 23 issue 4 

Full text available- flBpdffl 36 VIBl Additional Information: fgjl citation, abstract, references, citinGs, index 
' a * 1 terms 

The World Wide Web (WWW) is a fast growing global information resource. It contains an 
enormous amount of information and provides access to a variety of services. Since there is 
no central control and very few standards of information organization or service offering, 
searching for information and services is a widely recognized problem. To some degree this 
problem is solved by "search services/' also known as "indexers," such as Lycos, AltaVista, 
Yahoo, and others. ... 

Keywords: CGI, FORMS, HTML, HTTP, PERL, World-Wide Web, query language, query 
system 



34 A microprogrammed search controller for a text scanning processor 
F. J. Burkowski 

March 1980 ACM SIGIR Forum , Proceedings of the fifth workshop on Computer 

architecture for non-numeric processing, Volume 15 issue 2 
Full text available; ^j |&dff692.29 KB) Additional Information: full citation , abstract , references, index terms 

The objective of the research in this paper is the design of a non-numeric processor to be 
used in the scanning of textual information brought in from serial storage. Source text 
progresses through a linear array of 32 cells each cell capable of holding one character. 
With all cells operating in parallel, character subsequences in the source stream can be 
compared with character strings in any one of 16 registers associated with the cellular 
array. Various modules associated with the array ... 




35 Qnjhe encipher 
R. Bayer, J. K. Metzger 

March 1976 ACM Transactions on Database Systems (TODS), Volume l issue l 

Full text available: H apdfM.30 MB> Additional Information: full citation , abstract, references , cffings, index 
^ terms 

The securing of information in indexed, random access files by means of privacy 
transformations must be considered as a problem distinct from that for sequential files. Not 
only must processing overhead due to encrypting be considered, but also threats to 
encipherment arising from updating and the file structure itself must be countered. A 
general encipherment scheme is proposed for files maintained in a paged structure in 
secondary storage. This is applied to the encipherment of indexes or ... 



Keywords: B-trees, cryptography, encipherment, indexed sequential files, indexes, paging, 
privacy, privacy transformation, protection, random access files, search trees, security 

36 Onjhe use of regular expressions for sea^ 
Charles L. A. Clarke, Gordon V. Cormack 

May 1997 ACM Transactions on Programming Languages and Systems (TOPLAS), 

Volume 19 Issue 3 

Full text available: ^fidfCSlJ9.KBj Additional Information: fall citation, attract, references , ci t i ng s, i ndex 
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The use of regular expressions for text search is widely known and well understood. It is 
then surprising that the standard techniques and tools prove to be of limited use for 
searching structured text formatted with SGML or similar markup languages. Our 
experience with structured text search has caused us to reexamine the current practice. 
The generally accepted rule of "leftmost longest match" is an unfortunate choice and is at 
the root of the difficulties. We instead propose ... 

Keywords: SGML, regular expressions, regular languages 



37 Posters:..^ §j§ 
Diechanism. 

Ning-Han Liu, Yi-Hung Wu, Arbee L. P. Chen 

November 2003 Proceedings of the 5th ACM SIGMM international workshop on 
Multimedia information retrieval 

Full text available: pdf( 506,60 KB) Additional Information: full citation, abstract , references, index terms 

Querying polyphonic music from a large data collection is an interesting and challenging 
topic. Recently, researchers attempt to provide efficient techniques for content-based 
retrieval in polyphonic music databases where queries can also be polyphonic. However, 
most of the techniques do not perform the approximate matching well. In this paper, we 
present a novel method to efficiently retrieve k music works that contain segments most 
similar to the user query based on the edit distance. A list-b ... 

Keywords: indexing methods, lower bounded edit distance, polyphonic music information 
retrieval, search process 



38 ParaJM.teM.search..m 
Gerard Salton, Chris Buckley 

February 1988 Communications of the ACM, volume 3i issue 2 

Full text available - f5 ft pdT(1.53 MB) Additional Information: full citation , abstract , references , citings, index 
^ 1 * terms, review 

A comparison of recently proposed parallel text search methods to alternative available 
search strategies that use serial processing machines suggests parallel methods do not 
provide large-scale gains in either retrieval effectiveness or efficiency. 



Date„ciu^ 

A. K. Jain, M. N. Murty, P. J. Flynn 

September 1999 ACM Computing Surveys (CSUR), volume 31 issue 3 

Full text available: If) pdf(636.24 KB) Additional Information: full citation, abstract , references , citings , index 
^ * terms, fevjew 

Clustering is the unsupervised classification of patterns (observations, data items, or 
feature vectors) into groups (clusters). The clustering problem has been addressed in many 
contexts and by researchers in many disciplines; this reflects its broad appeal and 
usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult 
problem combinatorially, and differences in assumptions and contexts in different 
communities has made the transfer of useful generic co ... 

Keywords: cluster analysis, clustering applications, exploratory data analysis, incremental 
clustering, similarity indices, unsupervised learning 
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Marcos Andre Gongalves, Edward A. Fox, Layne T. Watson, Neill A. Kipp 

April 2004 ACM Transactions on Information Systems (TOIS), Volume 22 issue 2 

Full text available: flpdt(31§ 85 KBj Additional Information: full citation, abstract, references , citings, index 



terms, review 



Digital libraries (DLs) are complex information systems and therefore demand formal 
foundations lest development efforts diverge and interoperability suffers. In this article, we 
propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and 
Societies (5S), which allow us to define digital libraries rigorously and usefully. Streams are 
sequences of arbitrary items used to describe both static and dynamic (e.g., video) content. 
Structures can be viewed as labeled directed gra ... 

Keywords: applications., definitions, foundations, taxonomy 
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1 Fast string searching in secondary storage: theoretical developments and experimental 
results 

Paolo Ferragina, Roberto Grossi 

January 1996 Proceedings of the seventh annual ACM-SIAM symposium on Discrete 
algorithms 

Full text available: *p5 pdft 1.26 MB) Additional Information: full citation, references, citings, index terms 



2 Ihe.sMng.Brtree;„a.new | 
appjjcatjons 

Paolo Ferragina, Roberto Grossi 

March 1999 Journal of the ACM (JACM), volume 46 issue 2 

Full text available: f| pd?363 37 KB) Additional information: full citation , abstract , references , citings, index 
' ^ v " * terms 

We introduce a new text-indexing data structure, the String B-Tree, that can be seen as a 
link between some traditional external-memory and string-matching data structures. In a 
short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is 
made more effective by adding extra pointers to speed up search and update operations. 
Consequently, the String B-Tree overcomes the theoretical limitations of inverted files, B- 
trees, prefix B-trees, s ... 

Keywords: B-tree, Patricia trie, external-memory data structure, prefix and range search, 
string searching and sorting, suffix array, suffix tree, text index 



3 Provabiy sensitive Indexing strategies for biosequence similarity search 
Jeremy Buhler 

April 2002 Proceedings of the sixth annual international conference on Computational 
biology 

Additional Information: fuJj.c|tatjon, abstract, references, citings, Index 
terms 



Full text available: fg jpdfM.61 MB) 



The field of algorithms for pairwise biosequence similarity search is dominated by heuristic 
methods of high efficiency but uncertain sensitivity. One reason that more formal string 
matching algorithms with sensitivity guarantees have not been applied to biosequences is 
that they cannot directly find similarities that score highly under substitution score functions 
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such as the DNAPAM-TT [20], PAM [9], or BLOSUM [12] families of matrices. We describe a 
general technique, score simulatio ... 

4 .CompressM.^^^^^^ 

matching (extended abstract) 
Roberto Grossi, Jeffrey Scott Vitter 

May 1999 Proceedings of the thirty-second annual ACM symposium on Theory of 
computing 

Full text available: ^p„dfCiii.MBl Additional Information: MLcjtatjon, inferences, citings, Lndexjerms 



5 Qn.effeMve.muitj~dj.m 

H. V. Jagadish, Nick Koudas, Divesh Srivastava 

May 2000 ACM SIGMOD Record , Proceedings of the 2000 ACM SIGMOD international 
conference on Management of data, volume 29 issue 2 

Full text available- H odfM 15 MB) Additional lnformatjon: station, abstract, references , cifine^ Mex 
^ v *' terms 

As databases have expanded in scope from storing purely business data to include XML 
documents, product catalogs, e-mail messages, and directory data, it has become 
increasingly important to search databases based on wild-card string matching: prefix 
matching, for example, is more common (and useful) than exact matching, for such data. 
In many cases, matches need to be on multiple attributes/dimensions, with correlations 
between the dimensions. Traditional multi-dimensional index structures, ... 

6 Axis-specified search: a fine-grained fuli-text search method for gathering and 

structuring excerpts 
Yasusi Kanada 

May 1998 Proceedings of the third ACM conference on Digital libraries 

Full text available: B f|.pdfit 3.5 MB) Additional Information: MLcitation, Mer§nces, citings, index Jenris 



.AiL.searches.are.^ 
David E. Siegel 

July 1998 ACM SIGAPL APL Quote Quad , Proceedings of the APL98 conference on 

Array processing language, volume 29 issue 3 
Full text available: ^ ;>df{702.49 KB) Additional Information: full citation, abstract , references , index terms 

This paper considers the problem of searching for strings in a dictionary or symbol table. It 
presents a data structure which can be used for this purpose— the Ternary Tree. It 
considers the theoretical properties of this structure, compared with other possible 
structures for the same purpose. It presents an implementation of this structure in APL, 
including code to do a variety of operations on it. 

Keywords: data structure, dictionary search, radix search, search algorithms, search tree, 
symbol table, ternary tree 



8 implementing a faster string search algorithm in Ada 
P. Wood, D. Turcaso 

April 1988 ACM SIGAda Ada Letters, volume vm issue 3 

Full text available: ff|£dfC4;j 3.45..KB) Additional Information: fuLcitatLon, index terms 
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mBchanism 

Ning-Han Liu, Yi-Hung Wu, Arbee L. P. Chen 

November 2003 Proceedings of the 5th ACM SIGMM international workshop on 
Multimedia information retrieval 

Full text available: ^pdf(505,M.KBj Additional Information: ful] citation, abstract reference!, Index terms 

Querying polyphonic music from a large data collection is an interesting and challenging 
topic. Recently, researchers attempt to provide efficient techniques for content-based 
retrieval in polyphonic music databases where queries can also be polyphonic. However, 
most of the techniques do not perform the approximate matching well. In this paper, we 
present a novel method to efficiently retrieve k music works that contain segments most 
similar to the user query based on the edit distance. A list-b ... 

Keywords: indexing methods, lower bounded edit distance, polyphonic music information 
retrieval, search process 



10 Poster Se^ 
Robert W. P. Luk 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
2 

Full text available: *|| ] ;>df(397.79 KB) Additional Information: full citation, abstract , references 

This paper is about the modification of KMP (Knuth, Morris and Pratt) algorithm for string 
searching of Chinese text. The difficulty is searching through a text string of single-and 
multi-byte characters. We showed that proper decoding of the input as sequences of 
characters instead of bytes is necessary. The standard KMP algorithm can easily be 
modified for Chinese string searching but at the worst-case time-complexity of 0(3n) in 
terms of the number of comparisons. The finite-automaton ... 

11 Fast text searching for regular expressions or automaton searching on tries | 
Ricardo A. Baeza-Yates, Gaston H. Gonnet 

November 1996 Journal of the ACM (JACM), volume 43 issue 6 

Full text available: H odfj^SQ KB) Additional ,nformation: Miration, abstract, references, citings, index 
^ terms, review 

We present algorithms for efficient searching of regular expressions on preprocessed text, 
using a Patricia tree as a logical model for the index. We obtain searching algorithms that 
run in logarithmic expected time in the size of the text for a wide subclass of regular 
expressions, and in sublinear expected time for any regular expression. This is the first such 
algorithm to be found with this complexity. 



12 Database indexing for large DNA and protein sequence collections 
Ela Hunt, Malcolm P. Atkinson, Robert W. Irving 

November 2002 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 11 Issue 3 
Full text available: 'jf l) pciff 199.78 KB) Additional Information: full citation , abstract, citings, index terms 

Our aim is to develop new database technologies for the approximate matching of 
unstructured string data using indexes. We explore the potential of the suffix tree data 
structure in this context. We present a new method of building suffix trees, allowing us to 
build trees in excess of RAM size, which has hitherto not been possible. We show that this 
method performs in practice as well as the O(n) method of Ukkonen [70]. Using this 
method we build indexes for 200 Mb of protein and 3 ... 
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Keywords: Approximate matching, Biological sequence, Database index, Suffix tree 



13 index-driven similarity search in metric spaces 
Gisli R. Hjaltason, Hanan Samet 

December 2003 ACM Transactions on Database Systems (TODS), volume 28 issue 4 

Additional Information: MLsfatJon, abstract, reje^ence.&, cftncss, Ladex 



Full text available: "ilpdf(650, 64 KBj 

^ terms 

Similarity search is a very important operation in multimedia databases and other database 
applications involving complex objects, and involves finding objects in a data set S similar 
to a query object q, based on some similarity measure. In this article, we focus on methods 
for similarity search that make the general assumption that similarity is represented with a 
distance metric d. Existing methods for handling similarity search in this setting typically fall 
into one of... 

Keywords: Hiearchical metric data structures, distance-based indexing, nearest neighbor 
queries, range queries, ranking, similarity searching 



14 Efficient string matching: an aid to bibliographic search 
Alfred V. Aho, Margaret J. Corasick 
June 1975 Communications of the ACM, volume is issue 6 

Full text available: f a P dff 733 78 KB) Additional Information: fulj. citation, abstract, references, cjtlnfls, index 
^ " * ^ terms 

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite 
number of keywords in a string of text. The algorithm consists of constructing a finite state 
pattern matching machine from the keywords and then using the pattern matching machine 
to process the text string in a single pass. Construction of the pattern matching machine 
takes time proportional to the sum of the lengths of the keywords. The number of state 
transitions made by the pattern matching ... 

Keywords: bibliographic search, computational complexity, finite state machines, 
information retrieval, keywords and phrases, string pattern matching, text-editing 



15 RE-tree;„an effo 

Chee-Yong Chan, Minos Garofalakis, Rajeev Rastogi 

August 2003 The VLDB Journal — The International Journal on Very Large Data Bases, 

Volume 12 Issue 2 

Full text available: ^pdf(M6,.QQ.KBj Additional Information: full .citation, abstract, indexierms 

Abstract. Due to their expressive power, regular expressions (REs) are quickly becoming an 
integral part of language specifications for several important application scenarios. Many of 
these applications have to manage huge databases of RE specifications and need to provide 
an effective matching mechanism that, given an input string, quickly identifies the REs in 
the database that match it. In this paper, we propose the RE-tree, a novel index structure 
for large databases of RE specifications. Gi ... 

Keywords: Index structure, Regular expressions, Sampling-based approximations, Size 
measures 
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on APL: part 1, Volume 9 Issue 4 
Full text available: ^pd£61.9.22.KBj Additional Information: Ml cjtatiQ.Q, abstract, references, index terms 

A system is described applicable for information retrieval and update both in formatted and 
unformatted files containing non-numerical data. It uses a new reference-string indexing 
technique which supports partial-match queries and similar-record search. The reference- 
string index is adapted to data usage or data. Those parts of records which are estimated 
by the program to be specified very often in queries are included as reference strings and 
inverted. For data access in the retrieval ph ... 

17 Matching and searching analysis for parallel hardware implementation on FPGAs 
Pablo Moisset, Pedro Diniz, Joonseok Park 

February 2001 Proceedings of the 2001 ACM/SIGDA ninth international symposium on 
Field programmable gate arrays 

Full text available: ^pdff166.03 KB) Additional Information: full citation, abstract, references, index terms 

Matching and searching computations play an important role in the indexing of data. These 
computations are typically encoded in very tight loops with a single index variable and a 
simple search/ matching predicate. Their inherent sequential nature, either because of data 
dependences but more often because of very strong control dependences, makes it 
impossible to apply existing data dependence and parallelization analysis to exploit 
significant levels parallelism on traditional architecture ... 

1 8 An. experiment^ 

Paolo Ferragina, Giovanni Manzini 

January 2001 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete 
algorithms 

Additional Information: full citation , abstract, references, citings , index 



Full text available: TOpdf(785.55 KB) 

terms 

The size of electronic data is currently growing at a faster rate than computer memory and 
disk storage capacities. For this reason compression appears always as an attractive choice, 
if not mandatory. However space overhead is not the only resource to be optimized when 
managing large data collections; in fact data turn out to be useful only when properly 
indexed to support search operations that efficiently extract the user-requested 
information. 

Approaches to combine c ... 

19 The Jungle database search engine 
Michael Bohlen, Unas Bukauskas, Curtis Dyreson 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 

conference on Management of data, volume 28 issue 2 
Full text available: ^pdf(.382..§1.„KBj Additional Information: MLviMiQ-Q, abstract, references, indexjerms 

Information spread in in databases cannot be found by current search engines. A database 
search engine is capable to access and advertise database on the WWW. Jungle is a 
database search engine prototype developed at Aalborg University. Operating through JDBC 
connections to remote databases, Jungle extracts and indexes database data and meta- 
data, building a data store of database information. This information is used to evaluate and 
optimize queries in the AQUA query language. AQUA is a na ... 

20 A.Iinear. lower bound | 
Erik D. Demaine, Alejandro Lopez-Ortiz 

January 2001 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete 
algorithms 
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Full text available: Drift 438. 99 KB) Additional Information: full citation, abstract, references, index terms 

Most information-retrieval systems preprocess the data to produce an auxiliary index 
structure. Empirically, it has been observed that there is a tradeoff between query response 
time and the size of the index. When indexing a large corpus, such as the web, the size of 
the index is an important consideration. In this case it would be ideal to produce an index 
that is substantially smaller than the text. 

In this work we prove a linear lower bound on the size of any index that reports th ... 
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Huixiang Liu, Timothy C. Lethbridge 

November 2001 Proceedings of the 2001 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: ^ pdfM45.13 KB) Additional Information: full citation, abstract, references , index terms 

This paper describes a study of what we call intelligent search techniques as implemented 
in a software exploration environment, whose purpose is to facilitate software maintenance. 
The paper first introduces the intelligent search techniques used in our study, including 
abbreviation contraction and abbreviation expansion. Then it describes in detail the rating 
algorithms used to evaluate the query results' similarity to the original query strings. Next, 
we describe a series of experiments we co ... 
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Knowledge-based search tactics for an intelligent intermediary system 

Philip J. Smith, Steven J. Shute, Beb Galdes, Mark H. Chignell 

July 1989 ACM Transactions on Information Systems (TOIS), volume 7 issue 3 

Full text available - ffipdfM 84 MB) Additional Information: fuj). citation, abstract, refere.QC.eSj citings, index 
' ^ 1 terms, review 

Research on the nature of knowledge-based systems for bibliographic information retrieval 
is summarized. Knowledge-based search tactics are then considered in terms of their role in 
the functioning of a semantically based search system for bibliographic information 
retrieval, EP-X. This system uses such tactics to actively assist users in defining or refining 
their topics of interest. It does so by applying these tactics to a knowledge base describing 
topics in a particular domain and to a ... 
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We survey the current techniques to cope with the problem of string matching that allows 
errors. This is becoming a more and more relevant issue for many fast growing areas such 
as information retrieval and computational biology. We focus on online searching and 
mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its 
history and current developments, and the central ideas of the algorithms and their 
complexities. We present a number of experiments to ... 

Keywords: Levenshtein distance, edit distance, online string matching, text searching 
allowing errors 
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P. Wood, D. Turcaso 

April 1988 ACM SIGAda Ada Letters, volume vm issue 3 
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Edgar Chavez, Gonzalo Navarro, Ricardo Baeza-Yates, Jose Luis Marroquin 
September 2001 ACM Computing Surveys (CSUR), Volume 33 Issue 3 

Full text available: pdfl916.04 KB) Additjonal lnformation: Miration, abstract, references, citings, index 

terms 

The problem of searching the elements of a set that are close to a given query element 
under some similarity criterion has a vast number of applications in many branches of 
computer science, from pattern recognition to textual and multimedia information retrieval. 
We are interested in the rather general case where the similarity criterion defines a metric 
space, instead of the more restricted case of a vector space. Many solutions have been 
proposed in different areas, in many cases without cros ... 

Keywords: Curse of dimensionality, nearest neighbors, similarity searching, vector spaces 
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May 1998 Proceedings of the third ACM conference on Digital libraries 
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Palindromes are strings of symbols which are symmetrical about the center. This paper 
outlines a method for generating certain types of palindromes, called lexical palindromes, 
which consist of legitimate English words. The method reported provides substantial 
pruning of a Prolog search tree by calculating the number of success nodes along certain 
search paths instead of visiting them, indexing words to improve database performance, 
and continuous analysis of current states to eliminate non ... 

10 A new string search hardware architecture for VLSI 
K. Takahashi, H. Yamada, H. Nagai, K. Matsumi 

June 1986 ACM SIGARCH Computer Architecture News , Proceedings of the 13th 

annual international symposium on Computer architecture, volume 14 issue 2 
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This paper presents a new architecture for practical string search hardware design. This 
architecture is based on the finite state automaton design concept using a character control 
charge transfer model. The resultant hardware is a set of programmable sequential logic 
(PSL) circuits, each of which consists of a sequential logic and memory parts. The logic part 
is an array of logical gates, each of which is controlled by the read-out signal from the 
memory part, to connect the flip-flops. T ... 
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Paolo Ferragina, Roberto Grossi 

March 1999 Journal of the ACM (JACM), Volume 46 issue 2 

Full text available: « od«363.37 KB) Additional information: Miration, abstract, references, citings, Index 

terms 

We introduce a new text-indexing data structure, the String B-Tree, that can be seen as a 
link between some traditional external-memory and string-matching data structures. In a 
short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is 
made more effective by adding extra pointers to speed up search and update operations. 
Consequently, the String B-Tree overcomes the theoretical limitations of inverted files, B- 
trees, prefix B-trees, s ... 

Keywords: B-tree, Patricia trie, external-memory data structure, prefix and range search, 
string searching and sorting, suffix array, suffix tree, text index 



Software evolution: Generating programming language-based pattern matchers 
Santanu Paul, Atul Prakash 

October 1993 Proceedings of the 1993 conference of the Centre for Advanced Studies 
on Collaborative research: software engineering - Volume 1 

Full text available: ^.pdftl55. MB) Additional Information: fujj.cjtation, abstract, references 

This paper is based on a logical extension of our past work in pattern matching tools [22, 
24, 25 ] for reverse engineering. We explore two new directions: first, we investigate the 
need for new and more powerful source code and pattern representations to support a 
richer set of queries; and second, we develop the concept of automatic generation of 
pattern matchers for different programming languages starting from a high-level 
specification of the programming language. A generator will eliminate ... 
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April 2002 ACM Transactions on Information Systems (TOIS), volume 20 issue 2 
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^ terms, review 

Many applications depend on efficient management of large sets of distinct strings in 
memory. For example, during index construction for text databases a record is held for 
each distinct word in the text, containing the word itself and information such as counters. 
We propose a new data structure, the burst trie, that has significant advantages over 
existing options for such applications: it uses about the same memory as a binary search 
tree; it is as fast as a trie; and, while not as fast as a ... 

Keywords: Binary trees, splay trees, string data structures, text databases, tries, 
vocabulary accumulation 



Kimmo Fredriksson, Gonzalo Navarro 

April 2005 Journal of Experimental Algorithmics (JEA), Volume 9 issue es 

Full text available: W\ pdfd.77 MB? Additional Information: full citation, abstract, references, index terms 



We present a new algorithm for multiple approximate string matching. It is based on 
reading backwards enough l-grams from text windows so as to prove that no occurrence 
can contain the part of the window read, and then shifting the window. We show analytically 
that our algorithm is optimal on average. Hence our first contribution is to fill an important 
gap in the area, since no average-optimal algorithm existed for multiple approximate string 
matching. We consider several variants and practical i ... 

Keywords: Algorithms, approximate string matching, biological sequences, multiple string 
matching, optimality 
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spelling correction 

Kemal Oflazer 

March 1996 Computational Linguistics, Volume 22 issue 1 

Full text available: w (§| 

H | pan 1.0/2 MB j * 0 Additional Information: full citation, abstract, references , citings 
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This paper presents the notion of error-tolerant recognition with finite-state recognizers 
along with results from some applications. Error-tolerant recognition enables the 
recognition of strings that deviate mildly from any string in the regular set recognized by 
the underlying finite-state recognizer. Such recognition has applications to error-tolerant 
morphological processing, spelling correction, and approximate string matching in 
information retrieval. After a description of the concepts an ... 

17 All searches are divided into three parts: string searches using ternary trees 
David E. Siegel 

July 1998 ACM SIGAPL APL Quote Quad , Proceedings of the APL98 conference on 

Array processing language, volume 29 issue 3 
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This paper considers the problem of searching for strings in a dictionary or symbol table. It 
presents a data structure which can be used for this purpose— the Ternary Tree. It 
considers the theoretical properties of this structure, compared with other possible 
structures for the same purpose. It presents an implementation of this structure in APL, 
including code to do a variety of operations on it. 

Keywords: data structure, dictionary search, radix search, search algorithms, search tree, 
symbol table, ternary tree 



18 A text compression scheme that allows fast searching directly in the compressed file 
Udi Manber 
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f- i .. . . ~ A tsr>* Additional Information: full citation, abstract, references, citincs, index 
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terms, review 

A new text compression scheme is presented in this article. The main purpose of this 
scheme is to speed up string matching by searching the compressed file directly. The 
scheme requires no modification of the string-matching algorithm, which is used as a black 
box; any string-matching procedure can be used. Instead, the pattern is modified; only the 
outcome of the matching of the modified pattern against the compressed file is 
decompressed. Since the compressed file is smal ... 

Keywords: data compression search 



19 A fast string searching algorithm 
Robert S. Boyer, J. Strother Moore 

October 1977 Communications of the ACM. Volume 20 Issue 10 

Full text available: ^.pdgll 9.MB} Additional Information: Mlcjtatjon, abstract, references, citings 

An algorithm is presented that searches for the location, "il" of the first occurrence of a 
character string, "pat," in another string, "string." During the search operation, the 
characters of pat are matched starting with the last character of pat. The information 
gained by starting the match at the end of the pattern often allows the algorithm to proceed 
in large jumps ... 

Keywords: bibliographic search, computational complexity, information retrieval, linear 
time bound, pattern matching, text editing 



20 On the use of regular expressions for searching text 
Charles L. A. Clarke, Gordon V. Cormack 

May 1997 ACM Transactions on Programming Languages and Systems (TOP LAS), 

Volume 19 Issue 3 

Full text available* f ^j pdf(221 79 KB) Additional Information: full citation, abstract, references, citings, index 
1 terms 

The use of regular expressions for text search is widely known and well understood. It is 
then surprising that the standard techniques and tools prove to be of limited use for 
searching structured text formatted with SGML or similar markup languages. Our 
experience with structured text search has caused us to reexamine the current practice. 
The generally accepted rule of "leftmost longest match" is an unfortunate choice and is at 
the root of the difficulties. We instead propose ... 

Keywords: SGML, regular expressions, regular languages 
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Full text available: B f| pdf(1.65 MB) 




A syntax-directed picture analysis system based on a formal picture description scheme is 
described. The system accepts a description of a set of pictures in terms of a grammar 
generating strings in a picture description language; the grammar is explicitly used to direct 
the analysis or parse, and to control the calls on pattern classification routines for primitive 
picture components. Pictures are represented by directed graphs with labeled edges, where 
the edges denote elementary picture ... 

babilistic top-down parsing and language modeling 
n Roark 

e2001 Computational Linguistics, volume 27 issue 2 
II text available: 



Publisher Site 



Additional Information: full citation, abstract, references, citings 



This paper describes the functioning of a broad-coverage probabilistic top-down parser, and 
its application to the problem of language modeling for speech recognition. The paper first 
introduces key notions in language modeling and probabilistic parsing, and briefly reviews 
some previous approaches to using syntactic structure for language modeling. A lexicalized 
probabilistic top-down parser is then presented, which performs very well, in terms of both 
the accuracy of returned parses and the ef ... 
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December 1 980 ACM Computing Surveys (CSUR), volume 12 issue 4 
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We deal with the question as to whether there exists a polynomial time algorithm for 
computing the most probable parse tree of a sentence generated by a data-oriented parsing 
(DOP) model. (Scha, 1990; Bod, 1992, 1993a). Therefore we describe DOP as a stochastic 
tree-substitution grammar (STSG). In STSG, a tree can be generated by exponentially 
many derivations involving different elementary trees. The probability of a tree is equal to 
the sum of the probabilities of all its derivations. We show t ... 

5 Height in a digital search tree and the longest phrase of the Lempel-Ziv scheme 
Charles Knessl, Wojciech Szpankowski 

February 2000 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete 
algorithms 

Full text available: ^p.df(7_97..22.KBj Additional Information: Ml .citation, references, jndex terrns 



6 Control 
Gary Lindstrom 

May 1978 Proceedings of the 3rd international conference on Software engineering 

Full text available' pdf(739 9 6 KB) Add i tiona l Information: full citation , abstract, references, citings , index 

terms 

The range of control structures available in a higher-level programming language directly 
governs the set of algorithms conveniently programmable therein. This fact has been well- 
demonstrated by the salutary effect the ideas of structured programming have had on 
traditional control structures (sequential, iterative, and procedural). This paper seeks to 
demonstrate this same fact for more advanced control structures through the use of top- 
down parsing as a case study. A series of increasing! ... 

7 Student session: On reversing the generation process in Optimality Theory 
J. Eric Fosler 

June 1996 Proceedings of the 34th conference on Association for Computational 
Linguistics 

Full text available: f|M131 KB) 

jSjf Additional Information: full citation, abstract, references 

^.Publisher Site 

Optimality Theory, a constraint-based phonology and morphology paradigm, has allowed 
linguists to make elegant analyses of many phenomena, including infixation and 
reduplication. In this work-in-progress, we build on the work of Ellison (1994) to investigate 
the possibility of using OT as a parsing tool that derives underlying forms from surface 
forms. 

8 Head automata and bilingual tiling: translation with minimal representations 
Hiyan Alshawi 

June 1996 Proceedings of the 34th conference on Association for Computational 
Linguistics 

Full text available: = ff| 
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We present a language model consisting of a collection of costed bidirectional finite state 
automata associated with the head words of phrases. The model is suitable for incremental 
application of lexical associations in a dynamic programming search for optimal dependency 
tree derivations. We also present a model and algorithm for machine translation involving 
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10 LeZi-update: an information-theoretic approach to track mobile users in PCS networks S 
Amiya Bhattacharya, Sajal K. Das 

August 1999 Proceedings of the 5th annual ACM/IEEE international conference on 
Mobile computing and networking 

Full text available: ^ odfM.59 MB) Additional Information: full citation, references, citings, index terms 
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Giorgio Satta 

June 1994 Computational Linguistics, volume 20 issue 2 

Full text available:,™ j fM ^ MDV S| 

^pgfti.^O.iyiB)..^ Additional Information: MLcltatjon, abstract, references, citings 
Publisher Site 

The computational problem of parsing a sentence in a tree-adjoining language is 
investigated. An interesting relation is studied between this problem and the well-known 
computational problem of Boolean matrix multiplication: it is shown that any algorithm for 
the solution of the former problem can easily be converted into an algorithm for the 
solution of the latter problem. This result bears on at least two important computational 
issues. First, we realize that a straightforward method that impr ... 

12 \LeZ|::Upda^^^ 
networks 

Amiya Bhattacharya, Sajal K. Das 

March 2002 Wireless Networks, volume 8 issue 2/3 

Full text available: « D df(262.29 KB) Additional ,nformation: fu " citation ' SbSfasL references , citing index 

* terms 

The complexity of the mobility tracking problem in a cellular environment has been 
characterized under an information-theoretic framework. Shannon's entropy measure is 
identified as a basis for comparing user mobility models. By building and maintaining a 
dictionary of individual user's path updates (as opposed to the widely used location 
updates), the proposed adaptive on-line algorithm can learn subscribers' profiles. This 
technique evolves out of the concepts of lossless compression. T ... 

Keywords: LZ78 compression, entropy, location management, mobility model, paging, 
update 
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In this paper we present a general parsing strategy that arose from the development of an 
Earley-type parsing algorithm forTAGs (Schabes and Joshi 1988) and from recent linguistic 
work in TAGs (Abeille 1988) .In our approach elementary structures are associated with 
their lexical heads. These structures specify extended domains of locality (as compared to a 
context-free grammar) over which constraints can be stated. These constraints either hold 
within the elementary structure itself or specify ... 

14 The FINITE STRING newsletter: Abstracts of current literature 
Computational Linguistics Staff 

July 1986 Computational Linguistics, volume 12 issue 3 
Full text available: fllpdf(2,25 MB). 
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Graham Cormode, S. Muthukrishnan 

January 2002 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete 
algorithms 

Full text available: ^ pcif( 1.13 MB) Additional Information: full citation , abstract , references, citings 

The edit distance between two strings S and R is defined to be the minimum number of 
character inserts, deletes and changes needed to convert R to S. Given a text string t of 
length n, and a pattern string p of length m, informally, the string edit distance matching 
problem is to compute the smallest edit distance between p and substrings of t A well 
known dynamic programming algorithm takes time O(nm) to solve ... 

16 Papers:. .Gene rati on... 
Hadar Shemtov 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
2 

Full text available: ^ fxif(563.25 KB) Additional Information; full citation, abstract , references 

This paper presents a method for generating multiple paraphrases from ambiguous logical 
forms. The method is based on a chart structure with edges indexed on semantic 
information and annotations that relate edges to the semantic facts they express. These 
annotations consist of logical expressions that identify particular realizations encoded in the 
chart. The method allows simultaneous generation from multiple interpretations, without 
hindering the generation process or causing any work to be su ... 

17 Papers:. ReMrictedl.parM 
Peter Neuhaus, Udo Hahn 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
1 

Full text available: ^.p.df(656 l 57.KB) Additional Information: ML citation, abstract, references 

We present an approach to parallel natural language parsing which is based on a 
concurrent, object-oriented model of computation. A depth-first, yet incomplete parsing 
algorithm for a dependency grammar is specified and several restrictions on the degree of 
its parallelization are discussed. 

C. N. Fischer, D. R. Milton, S. B. Quiring 

January 1977 Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of 
programming languages 
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An LL(l)-based error-corrector which operates by "insertion-only" is studied. The corrector 
is able to correct and parse any input string. It is efficient (linear in space and time 
requirements) and chooses least-cost insertions (as defined by the user) in correcting 
syntax errors. Moreover, the error- corrector can be generated automatically from the 
grammar and a table of terminal symbol insertion costs. The class of LL(1) grammars 
correctable by this method contains (with minor modifications) ... 

19 iVIpdeJingforJexlcg 

Timothy Beli, Ian H. Witten, John G. Cleary 

December 1 989 ACM Computing Surveys (CSUR), Volume 21 Issue 4 

Full text available: fft pdK3.54MB) Additional Information: Mil citation , 3bst£SSL references , citincfs, Index 
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The best schemes for text compression use large models to help them predict which 
characters will come next. The actual next characters are coded with respect to the 
prediction, resulting in compression of information. Models are best formed adaptively, 
based on the text seen so far. This paper surveys successful strategies for adaptive 
modeling that are suitable for use in practical text compression systems. The strategies fall 
into three main classes: finite-context modeling, i ... 

20 Technique for automatically correcting words in text 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), volume 24 Issue 4 

Full text available: « pdf(6.23 MB> Additional Information: Ml citation, abstract, references, citings, .index 
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terms, review 

Research aimed at correcting words in text has focused on three progressively more difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context- 
dependent work correction. In response to the first problem, efficient pattern-matching and 
n-gram analysis techniques have been developed for detecting strings that do not appear in 
a given word list. In response to the second problem, a variety of general and application- 
specific spelling cor ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checking, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 
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1 String storage and searching for data base applications: implementation on the iNOY 

backead kernel 
George P. Copeland 

August 1978 , Volume 10 , 13 , 7 Issue 1,2,2 

Full text available: ^l |pcif(986.51 KB) Additional Information: full citation , abstract, references 

User and hardware cost trends dictate that data base systems should provide more 
complete functionality, simplicity of use, and reliability by increasing the amount of 
hardware present in the system. These goals are accomplished with a simple hardware 
arrangement within a one-dimensional cellular storage system called INDY. The INDY 
backend kernel is intended as a powerful tool for implementing all data models. The INDY 
cellular storage array is intended to provide functionality that is difficul ... 

2 String storage and searching for data base applications: implementation on the INDY 

Mckend kernel 
George P. Copeland 

August 1978 Proceedings of the fourth workshop on Computer architecture for non- 
numeric processing 



Full text available: "fjj pdf(854.23 KB) 



Additional Information: full citation, abstract, references , citings, index 
terms 



User and hardware cost trends dictate that data base systems should provide more 
complete functionality, simplicity of use, and reliability by increasing the amount of 
hardware present in the system. These goals are accomplished with a simple hardware 
arrangement within a one-dimensional cellular storage system called INDY. The INDY 
backend kernel is intended as a powerful tool for implementing all data models. The INDY 
cellular storage array is intended to provide functionality that is dif ... 



Fast and flexible word searching on compressed text 

Edleno Silva de Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-Yates 

April 2000 ACM Transactions on Information Systems (TOIS), Volume 18 Issue 2 



Full text available: l P|pdff165.20 KB) 



Additional Information: MLcjMiQn, abstract, references, citings, index 
terms, review 



We present a fast compression technique for natural language texts. The novelties are that 
(1) decompression of arbitrary portions of the text can be done very efficiently, (2) exact 
search for words and phrases can be done on the compressed text directly, using any 
known sequential pattern-matching algorithm, and (3) word-based approximate and 
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extended search can also be done efficiently without any decoding. The compression 
scheme uses a semistatic word-based model and a Huffman code wher ... 

Keywords: compressed pattern matching, natural language text compression, word 
searching, word-based Huffman coding 



4 l.r..B.eflU^ S 
Kimmo Fredriksson, Gonzalo Navarro 

April 2005 Journal of Experimental Algorithmics (JEA), Volume 9 issue es 

Full text available: ^ pdf(1.77 MB) Additional Information: full citation, abstract, references, index terms 

We present a new algorithm for multiple approximate string matching. It is based on 
reading backwards enough l-grams from text windows so as to prove that no occurrence 
can contain the part of the window read, and then shifting the window. We show analytically 
that our algorithm is optimal on average. Hence our first contribution is to fill an important 
gap in the area, since no average-optimal algorithm existed for multiple approximate string 
matching. We consider several variants and practical i ... 

Keywords: Algorithms, approximate string matching, biological sequences, multiple string 
matching, optimality 



5 FaM.text.searchjng:„a 
Sun Wu, Udi Manber 

October 1992 Communications of the ACM, volume 35 issue 10 

Full text available: ^pdW5.33 MB) Additional Information: full citation , references, 




citings, index terms , review 



Keywords: approximate string matching, information retrieval, pattern matching, software 
tools, string searching 



6 A.gujded.tgur to.approx^ 
Gonzalo Navarro 

March 2001 ACM Computing Surveys (CSUR), Volume 33 issue l 

Full text available: ^ pdfM.19 MB? Additional Information: full citation , abstract references, cjtjncy>. Index 

" terms, reyievy 

We survey the current techniques to cope with the problem of string matching that allows 
errors. This is becoming a more and more relevant issue for many fast growing areas such 
as information retrieval and computational biology. We focus on online searching and 
mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its 
history and current developments, and the central ideas of the algorithms and their 
complexities. We present a number of experiments to ... 

Keywords: Levenshtein distance, edit distance, online string matching, text searching 
allowing errors 



7 A new apprpach.tg„M 

Ricardo Baeza-Yates, Gaston H. Gonnet 

October 1992 Communications of the ACM, Volume 35 issue 10 

Full text available: B P jpdff5.31 MB) Additional Information: full citation , references, citings, index terms 
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Keywords: string matching, text searching 



8 AMcrggrgflra^ 
F. J. Burkowski 

March 1980 ACM SIGIR Forum , Proceedings of the fifth workshop on Computer 

architecture for non-numeric processing, volume is issue 2 
Full text available: fxif(692.29 KB) Additional information: full citation , abstract, references , index terms 

The objective of the research in this paper is the design of a non-numeric processor to be 
used in the scanning of textual information brought in from serial storage. Source text 
progresses through a linear array of 32 cells each cell capable of holding one character. 
With all cells operating in parallel, character subsequences in the source stream can be 
compared with character strings in any one of 16 registers associated with the cellular 
array. Various modules associated with the array ... 

9 £aMjearchjng.pn.co 

Edieno Silva de Moura, Gonzaio Navarro, Nivio Ziviani, Ricardo Baeza-Yates 
August 1998 Proceedings of the 21st annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: *p | pdf(119 MB) Additional Information: full citation, references, citings, index terms 



10 Associative Processor Architecture — a Survey 
S. S. Yau, H. S. Fung 

January 1977 ACM Computing Surveys (CSUR), volume 9 issue l 

Full text available: ^.pdgi,87. MB) Additional Information: fulj. citation., references, .gitings, hidex tejms 



11 Exploit]^ 

Victor Wing-Kit Mak, Kuo Chu Lee, Ophir Frieder 

January 1991 ACM Transactions on Information Systems (TOIS), volume 9 issue l 

Additional Information: full citation , abstract, references, citings, index 



Full text available: TO pdf(142 MB) 

terms, review 

We propose a document-searching architecture based on high-speed hardware pattern 
matching to increase the throughput of an information retrieval system. We also propose a 
new parallel VLSI pattern-matching algorithm called the Data Parallel Pattern Matching 
(DPPM) algorithm, which serially broadcasts and compares the pattern to a block of data in 
parallel. The DPPM algorithm utilizes the high degree of integration of VLSI technology to 
attain very high-speed processing through parallelism. ... 



Keywords: DPPM, pattern matcher 



1 2 A . new japproach .to .text search] ng. 
R. A. Baeza-Yates, G. H. Gonnet 

May 1989 ACM SIGIR Forum , Proceedings of the 12th annual international ACM 

SIGIR conference on Research and development in information retrieval, 

Volume 23 Issue 1-2 

Full text available: ^pdfC591.96.KBj Additional Information: full citation, abstract, citings, ]n.lex.terms 

We introduce a family of simple and fast algorithms for solving the classical string matching 
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problem, string matching with don't care symbols and complement symbols, and multiple 
patterns. In addition we solve the same problems allowing up to k mismatches. Among the 
features of these algorithms are that they are real time algorithms, they don't need to 
buffer the input, and they are suitable to be implemented in hardware. 

1 3 Fortran 8X draft 
Loren P. Meissner 

December 1989 ACM SIGPLAN Fortran Forum, Volume 8 Issue 4 

Full text available: ^pdf(21.36 MS) Additional Information: full citation, abstract, index terms 

Standard Programming Language Fortran. This standard specifies the form and 
establishes the interpretation of programs expressed in the Fortran language. It consists of 
the specification of the language Fortran. No subsets are specified in this standard. The 
previous standard, commonly known as "FORTRAN 77", is entirely contained within this 
standard, known as "Fortran 8x". Therefore, any standard-conforming FORTRAN 77 
program is standard conforming under this standard. New features can b ... 

14 Operational 

Roger L. Haskin, Lee A. Hollaar 

March 1983 ACM Transactions on Database Systems (TODS), volume 8 issue l 

Full text available- ^ pel f( 1.84 MB) Additional Information: full citation , abstract , references , citings, index 

terms 

The design and operation of a new class of hardware-based pattern matchers, such as 
would be used in a backended database processor in a full-text or other retrieval system, is 
presented. This recognizer is based on a unique implementation technique for finite state 
automata consisting of partitioning the state table among a number of simple digital 
machines. It avoids the problems generally associated with implementing finite state 
machines, such as large state table memories, complex cont ... 

Keywords: backend processors, computer system architecture, finite state automata, full 
text retrieval systems, text searching 



^ 5 EssLdet^ > | 

Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced Studies 
on Collaborative research 

Full text available: ^pdf(421Jy1B]i Additional Information; M-Citation, abstract, refe.ren.ces, indexlerms 

Understanding distributed applications is a tedious and difficult task. Visualizations based on 
process-time diagrams are often used to obtain a better understanding of the execution of 
the application. The visualization tool we use is Poet, an event tracer developed at the 
University of Waterloo. However, these diagrams are often very complex and do not provide 
the user with the desired overview of the application. In our experience, such tools display 
repeated occurrences of non-trivial commun ... 

16 Hardware for searching very large text databases | 
Roger Haskin 

March 1980 ACM SIGIR Forum , Proceedings of the fifth workshop on Computer 
architecture for non-numeric processing, volume 15 issue 2 

Full text available: f fi pd«812.S0 KB) Additlonal Information: Miration, abstract, references, citings, index 
^ terms 

This paper discusses the problem of searching very large text databases. It is shown that 
conventional techniques for searching current databases cannot be scaled up to larger ones, 
and that it is necessary to build hardware to search the database in parallel if reasonable 
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search times are expected. The part of the search process requiring the highest bandwidth 
is scanning the database to detect instances of search terms. Methods of doing this in 
hardware that have been mentioned in the lit ... 

17 Fitting processors to the needs of a General Purpose Array (EGPA) | 
Wolfgang Handler, Rainer Klar 

September 1975 Proceedings of the 8th annual workshop on Microprogramming 

Full text available- H >pdff781.81 KB) Additional lnformation: M citation, abstract, references, cities, Index 
' * terms 

1.1 General Purpose Processor The vast majority of contemporary processors can be 
described as general purpose processors (GPP), with a structure often referred to as "von 
Neumann". 1.2 Associative Array Processor In addition special processors have been 
developed for dedicated applications, some involving parallel processing. The associative 
array processors (AAP) [2] or the synchronous array pro ... 

18 Mjcroprocesso | 
S. G. Zaky 

January 1977 Proceedings of the 3rd workshop on Computer architecture : Non- 
numeric processing, Volume 6 , 9 , 12 Issue 2,2,1 

Full text available- « Ddf<629 28 KB^ Additional lnformation: Ml citation ' references , cjiinus, index 

LLJt * terms 

The problem of processing of non-numeric data has received considerable attention in the 
last few years. This is primarily motivate by the pressing needs in the are a of data base 
management. It has long been recognized that the parallel processing capabilities of an 
associative processor are fundamentally well suited to this environment. However, the 
complexity and cost of truely associative memories make this approach impractical. In this 
paper, the demands that non-numeric processing pla ... 

19 Information Retrieval: FEEKABIT. computer offspring of punched card PEEKABOO, for | 

natyraljanguage.searchln 
Fred C. Hutton 

September 1968 Communications of the ACM, volume 11 issue 9 

Full text available: * ^ pdf(578.18 KB) Additional Information: full citation, abstract , references 

The "peekaboo" idea from punched card information retrieval methods has been mated with 
the idea of superimposed punching to produce a programming technique which cuts 
computer run time in half on a test search of 33,000 subject index entries. A search 
program using the device has been operational since late 1963. As an item is entered in the 
store, an 18-byte mask is created from the item's meaningful words using the inclusive OR 
operation. If, at search time, the logical produ ... 

Keywords: computer search technique, information compaction, natural language 
searching, peekaboo, superimposed coding, text searching 



20 An efficient normalized maximum likelihood algorithm for DNA sequence compression S 
Gergely Korodi, loan Tabus 

January 2005 ACM Transactions on Information Systems (TOIS), volume 23 issue 1 
Full text available: ^.pdfi426..79 KB) Additional Information: fujl.cjtatjon, abstract, references, jnde^terms 

This article presents an efficient algorithm for DNA sequence compression, which achieves 
the best compression ratios reported over a test set commonly used for evaluating DNA 
compression programs. The algorithm introduces many refinements to a compression 
method that combines: (1) encoding by a simple normalized maximum likelihood (NML) 
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model for discrete regression, through reference to preceding approximate matching blocks, 
(2) encoding by a first order context coding and (3) representing str ... 

Keywords: Approximate sequence matching, DNA compression, normalized maximum 
likelihood model 
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Searching for include files... Searching for example files ... 

... Generating namespace index... Generating docs for namespace Mask ... 

Generating page index... Generating graph info page... Generating search index. ... 

hepvvvvw.ph.qmul.ac.uk/11calo/buiid/nightly/ logfiles/i686-rh73-gcc32--opt'monFramework.doxygen *• 8k •• 

Patent 5142631: System for queuing individual read or write mask ... 
... instruction write mask generating means coupled to said instruction ... 
that mask with an index-register from the most recently decoded specifier. ... 

www.freepatentsoniine.cortv5142631.html - 81k - Cached - .^rn|iaj;.gages 

Novell Documentation: QuickFinder Server - Creating Indexes 

... (Optional) If you want to mask the actual URL displayed in the search results 

... Generating an index is the actual process where QuickFinder Server ... 

www.novell.com/documentation/ qfeerver40/qfserver/data/acdvio8.html 43k -• Cached - Sm|Jar gages 

Circuit Specialists Inc :: View topic - Generating "Solder Mask ... 
Circuit Specialists Inc Forum Index, Circuit Specialists Inc ... 4:56 pm Post 
subject: Generating "Solder Mask" Files with PCB Wizard PRO, Reply with quote ... 

www.circuitspeciaiists.com/ phpBB2/viewtopic,php?p=64& - 26k - Cached - Simijar Bages 

SleepQuest - Sleep Disorders - Sleep Glossary A to Z - C INDEX 
... and prevents collapse of the upper airway by generating a prescribed level of 
... Air pressure is delivered through a hose to a mask that fits over the ... 

wwv.sleepquest.com/s_sleeptopics_chtmi - 20k - Cached - Similar pages 

PSIgate - Physical Sciences Information Gateway - Web Catalogue ... 
... Search For Term: Term (Index) Definition blank part of the mask transparent to 
... laser capable of generating very short wavelength (below 200 nm) UV ... 

ww\.v. psigate.ac.uk/roads/cgi-bin/search_ webcata!ogue2.pi?limit=100&term1=radiation - 23k - 
Cached - SimNar .pages 

PSIgate - Physical Sciences Information Gateway - Web Catalogue .„ 
... Search For Term: Term (Index) Definition etch mask material blocking etching in 
... laser capable of generating very short wavelength (below 200 nm) UV ... 

www.psigate,ac.uk/roads/cgi-bin/search_ webcataiogue2.pi?iimit=875&terrn1=dictionary - 37k - 

Cached - Si n^iar.gages 

Searching Products By Price - Range: 0 And 10 

... Noise is used typically used to mask distracting sounds in a pleasing way. 

... Most of the "sleep" machines on the market are based on generating ... 

www.luxevtvant.com/ index.asp?PageAction=PRICESEARCH&Range«0%20And%2010 - 86k •■ 

Cached - Similar pages 

IIS Web Server Security - Mask Windows Web Server with ServerMask 
... extensions is a good practice to mask the technology generating dynamic pages. 
... search engine optimization and search engine marketing articles are ... 

www.seoconsultants.com/aiticles/1000/security.asp-29k - May 19, 2005 - Cached - Sjnjjjar pas.es 
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Majordomo-Users: Wilma / Glimpse 

... #INDEX_BACKGROUND = # [ wilma ] # Color name or mask for index page if 
INDEX_BACKGROUND not ... which rebuilds the search index for the # archives. ... 

www.greatcircie.eom/lisi:s/majordorno-users/ mhonarc/majordomo-users.199906/msg00206,html - 14k • 

Cached - Siir]jjar.pages 
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Did you mean: match character sets characters search string bitmask 
PHP: String Functions - Manual 

... string to an 8 bit string; quotemeta - Quote meta characters ... of the first 
n characters; strpbrk - Search a string for any of a set of characters ... 

php.benscom.com/manual/p!/ref.strings.php - 47k - Cached - SjM!ar.pa i ges 

string 

... Search string 2 for a sequence of characters that exactly match the characters 
... set length [string length $string] if {$length == 0} { set isPrefix 0 ... 

trnml.sourceforge.net/doc/tcl/string.html - 19k - Cached - Similar pages 

Filters 

... the first successful match in the FST, so if one of the search strings is ... 
128 Table has top bit set characters (ISO standard) 64 Table has numeric ... 

wvAv.zxplus3e.plus.conVz88forever/dn327/fiiters.htm- 16k - Cached - Sifnilar.pa.ggs 

Methods 

... The set of characters to look for. mask, Mask values determining how to search. 
... Abstract: Test if a string contains only the characters from a set. ... 

homepage.mac.com/... /NSString+NDUtilities/ Categori6s/NSString w NDUtiiities_/Methods/Methods.html * 26k •• 

Cached - Simitar pages 

Writing Apache Modules with Perl and C 

... You provide a search pattern, a string to search, and a bit mask of option flags. 
... will attempt to match the given string against the character mask. ... 

files.printf.dk/docs/apache__perl/156.htm - 19k - Cached - §imj!ar.Bagets 

efg's Delphi Strings 

... A valid mask consists of literal characters, sets, and wildcards. ... A set 
must match a single character in the string. The character matches the set ... 

www.efg2.conT/Lab/Library/Delphi/Strings/ •• 92k -■ Cached - SiMliLRages. 

SQL and PL/SQL Programming in a Global Environment 

... representation of a character string in one character set to another. ... 

match a portion of one character value to another by searching the first value ... 

www.ic.ieidenuniv.nl/awcourse/ oracle/server.920/a96529/ch7.htm - 54k - Cached - SjmHarpages 

SQL Programming 

... conditions match a portion of one character value to another by searching the 
first ... strings using characters as defined by the input character set. ... 

zuse.esnig.cifom.ch/database/doc_oracle/ Orade901 JJnux/server.901/a90236/ch7,him - 44k - 

Cached - .§iM[ar j;tages 

[pptj How To Read Research Papers 

File Formal: .Microsoft Powerpoint 97 - View as HTML 

... to bit mask and then comparing. Finds sets of words easier (phrases) ... 

of the search string, starting on the next character from each confirmed miss ... 
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Tel Built-in Commands - string manual page 

... Search string 2 for a sequence of characters that exactly match the ... equal to 
string except that any leading or trailing characters from the set given ... 

nnsa.di.ac.uk/MiDAS/manuai/ActiveTcl8. 4,9.0-html/lcl/TclCmd/string.htm - 23k - Cached - Simjjar fiaggs 
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Boost. Reaex: match flag type 

... The type match_flag_type is an implementation specific bitmask type ... 

of the character container sequence being searched that do match the regular ... 

vww.boost.org/iibs/regex/doc/match_fiag_type.htm! - 13k - Cached - SjmHarpa^es 

What is a hex editor? 

... the specified byte, ability to search for a single byte or character string. 
... all open files; bookmark all, replace all, alignment, wildcards/bitmask ... 

www.tech-faq.com/hex-editor.shtmi - 24k - May 19, 2005 - Cached - Simil^r.gages 

Longest Common Substring 

... If that character does occur in the target, you might be able to use a bitmask 
... to find a match for one entire string anywhere within another string. ... 

use.peri.org/comrnents.pi?std= : 22777&op=& threshoid=0&commentsorL==0&mode™thread&ttd= : 34&a.., - 30k - 

G«c|^d - Simjjar pages 

Unicode Transformation Formats 

... the corresponding simple set of characters such as ABC's character range ... 
String searches (fgrep) for a multibyte character beginning with a lead ... 

czyborra.com/utf/ - 59k - Cached - SiM!ar..Bages 

InfoType 

... bitmask enumerating the clauses in the CREATE CHARACTER SET statement, ... 
string containing all special characters (that is, all characters except a ... 

www.canaimasoftxom/f9Gsq!/ OnStneManual/Appendixl/lnfoTypeTable.htmi - 1 19k - Cached - Sirnijai^a^es 

Single Round Match 154 Statistics at TopCoder 

... We iterate through each character of the string. ... We can represent the set 

of current states as a simple bitmask (eg, an array of boolean values), ... 

www.topcoder.com/1ndex?t=statistics&c=srm154_prob - 27k - Cached - S!rnl!aj;.pa.ges 

Single Round Match 158 Statistics at TopCoder 

... Since we only care about at most 25 colors, we can use a bitmask to represent 

... and each pattern is 5 characters long, the position of the pads can be ... 

wviA/v.topcoder.com/lndex?t=statisttcs&c=srml 58_prob ~ 26k - Cached - S|mnar.paflgS 

Command Appendix B, The SQL Getlnfo Function 

... Null-terminated character string,. 16-bit integer value,. 32-bit bitmask, ... 

match metacharacters underscore (_) and percent (%) as valid characters in ... 

www.4d.fr/docurnentation/4DDoc67/CMU/CMU1 1922, HTM - 89k - Cached - Similar pages 

Documentation 

... (Maximum number of strings: 5; Maximum string length: 32 characters, ... 

An address bitmask of decimal numbers that represent the address bits to match. ... 

docs.us.delLcom/suppQrt/ edocs/network/5P788/CLIG/snmp.htm - 24k - May 18, 2005 - Cached vSjrnjjar.pages 
Appendix E. RabbitNet 

... and one character liquid crystal display from 1 * 8 to 4 * 40 characters ... 
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status flags see MATCH macros below unsigned int ports; // port bitmask ... 

www, rabbitsemiconductor.com/documentation/ docs/manuals/PowerCoreFLEX/UsersManual/erabbitn.htm - 46k - 
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Generating Search Index for ht://Dig. 
Generating Search Index for ht://Dig. Login:. Password: 

www.puzzlehist0j7.com/cgi-bin/rundig -1k- Cached - Sjnsiarp.ag.es 

Generating the search index 

Generating the search index. The search system is based on the ht://dig search 
engine. ... The basic command for generating the search index is: ... 

osr507doc-sco.com/en/DSK_docview/SearchDV.html - 7k Cached - Similar pages 



DDJ 

... Generating time-lapse animations starts with acquiring and storing images, 
... DDJ's Index to Advertisers makes it easy for you to access product ... 

www.ddj.com/ - Sjmilar jgages 

Simple URLs for Search Engine Robots: SearchTools Report 

Search Tools Reports. Generating Simple URLs for Search Engines ... Some public 

search engines and most site and intranet search engines will index URLs ... 

www.searchioois.com/robots/gooduris, htmi - 24k - Cached - Similar pages 

One-Step Webpages by Stephen P. Morse 

... Soundex: Generating American and Daitch-Mokotoff Soundex Codes in One Step 
... Brooklyn 1925 Name Index: Searching the Brooklyn 1925 Census in One Step ... 

stevemorse.org/ - 29k - May 1 8, 2005 - Cached - Simjlaf.pajges 

How Search Works - Lvcos InSite 

... When a result set is requested, the search index evaluates the content in the 
Web index and the paid index. Generating Search Results ... 

insite.lycos.com/ searchenginemarketing/howsearchworks.asp -■ 10k ■- Cached - Sjmijar gages 

Searching for include files... Searching for example files „. 

... Generating namespace member index... Generating page index... Generating graph 

info page... Generating search index... Generating stylesheet... 

hepwww.ph.qmuLac.uk/Hcalo/buiid/ i1cafo-00-00-01/iogfiies/ifnuX'gcc/bbmServices.doxygen - 6k - 
Cached - .Sln^[ar j>ages 

RCCD LAMP - Searching Indexes 

... Selecting a database or index Subject searching Keyword searching Identifying 
concepts and generating search terms AND, OR, and NOT (Boolean operators) ... 

Iibrary.rcc.edu/searchingindexes.htm - 48k * Cached - Similar .pa&es 

File Manaaer/DB2 Data V5R1 User's Guide - Contents 

... Generating a LISTDEF statement in the DB2 utility job (DB2 Version 7 and higher) 

... RECOVER Utility (Index Spaces) with LISTDEF panel ... 

publtb.boulder.ibm.com/infocenter/pdthelp/ topic/com.ibm.fi!emanager5.doc/db2/fmnu2e01 02.htm - 84k - 

Cached - Si[T]ljar.|^ge£ 

Re: Generating Namazu Indexes 

... Search index for a list is not generated if the Name: starts with a ... 



http://www.google.com/search?hl=en&lr=&q=generating+search+index 
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Generating Namazu Indexes, jasonc. Re: Generating Namazu Indexes, ... 

www.mhonarc.org/archive/himl/ mharc-users/2G03-09/msg00007.htm! - 8k - Cached - Similar pages 
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Search within results | Language Tools | Search Tips | Dissatisfied? Help us improve 



Google Home - Advertising Programs - Business Solutions - About Google 

©2005 Google 
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" Search History Transcript 
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WEST Search History 

DATE: Friday, May 20, 2005 



Hide? Set Name Query Hit Count 

DB=PGPB, USPT, USOC,EPAB,JPAB,DWPI, TDBD; PLUR=YES; OP=ADJ 

□ L17 L16andsearch$ 1 

□ L16 L15 andindex$ 5 

□ LI 5 L14 and (character near 5 string) 6 
D L14 (bit mask).ab. 567 

□ L13 bitmask.ab. 567 

□ L12 LI 1 and (mask$ same match$) 0 

□ Lll LI 0 and matchS 19 
O L10 L9 and (searchS near5 index$) 55 

□ L9 (characters 1 and string$l).ti. 3714 

□ L8 5377349 .uref. 8 

□ L7 11 and mas$ 2 

□ L6 6470347 .uref 0 

□ L5 6470347.uref. 0 

□ L4 5841376 .uref 7 

□ L3 L2 and columns 1 8 
F~ L2 LI and inputs 186 

□ LI (searchS and stringSl and characters l).ti. 338 



END OF SEARCH HISTORY 



http://westbrs:9000^in/cgi-bin/srchhist.pl?state= : 41321f. 1 8. l&f=ffsearch&userid=schannavajj . 
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Record List Display 



Page 1 of 2 



Hit List 



Search Results - Record(s) 1 through 1 of 1 returned. 



□ 1. Document ID: US 20040190526 Al 

L17: Entry 1 of 1 File: PGPB 



Sep 30, 2004 



PGPUB- DOCUMENT-NUMBER : 20040190526 
PGPUB- FILING- TYPE : new 

DOCUMENT-IDENTIFIER: US 20040190526 Al 

TITLE: Method and apparatus for packet classification using a forest of hash tables 
data structure 

PUBLICATION-DATE: September 30, 2004 



INVENTOR-INFORMATION: 
NAME 

Kumar, Alok 
Yavatkar, Raj 



CITY 

Santa Clara 
Portland 



STATE 

CA 

OR 



COUNTRY 

US 

US 



RULE-47 



US-CL-CURRENT: 370/395.21; 37 0/395 . 32 



Front ~[ Review j Classification 



3e!e ren eel Sequences | Attachments ] C laTnis [ KVtfC j Draw, 



Term 


Documents 


SEARCHS 


0 


SEARCH 


330046 


SEARCHA 


4 


SEARCHAB 


1 


SEARCHABILITIES 


1 


SEARCHAB ILITY 


145 


SEARCHAB ILITYUPD ATE 


1 


SEARCHAB L 


1 


SEARCHABLE 


6048 


SEARCHABLEBATHS 


2 


"SEARCHABLEDEEDLISTIMPLEMENTOR.JAVA" 


1 


(L16 AND 
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Hit List 



Your ! wildcard search against 10000 terms has yielded the results below. 
Your result set for the last Lit is incomplete. 
The probable cause is use of unlimited truncation. Revise your search strategy to use limited tnincation. 




Search Results - Record(s) 1 through 5 of 5 returned, 
n 1: Document ID: US 20040190526 Al 

L16: Entry 1 of 5 File: PGPB Sep 30, 2004 

PGPUB- DOCUMENT-NUMBE R: 20040190526 
PGPUB- FILING-TYPE : new 

DOCUMENT-IDENTIFIER: US 20040190526 Al 

TITLE: Method and apparatus for packet classification using a forest of hash tables 
data structure 

PUBLICATION- DATE : September 30, 2004 

INVENTOR-INFORMATION: 

NAME CITY 

Kumar, Alok Santa Clara 

Yavatkar, Raj Portland 

US-CL-CURRENT: 370 / 395.21 ; 370/ 395.32 



Full j Tit! a j Citation j Front [ Review I Classitio,gtion j D^te [Reterence [ Sequences] Afiachments: ] C la ims] KMC \ Draw. D<j 



□ 2. DocumentID: US 20030081846 Al 

L16: Entry 2 of 5 File: PGPB May 1, 2003 

PGPUB- DOCUMENT -NUMBER : 20030081846 
.PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20030081846 Al 

TITLE: Digital image transmission with compression and decompression 

PUBLICATION-DATE: May 1, 2003 

INVENTOR-INFORMATION: 

NAME CITY STATE COUNTRY RULE- 4 7 

Whitehead, Jeffrey A. Oakland CA US 



STATE COUNTRY RULE- 4 7 

CA US 
OR US 



http://westbre:9000/b^ 5/20/05 
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US-CL-CURRENT: 382/239; 382/166 
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'it |" Rft'y'iei.' 1 



File: USPT 



D 3. Document ED: US 61 18899 A 

L16: Entry 3 of 5 
US-PAT-NO: 6118899 

DOCUMENT-IDENTIFIER: US 6118899 A 



TITLE: Method for lossless bandwidth compression of a series of glyphs 
DATE- ISSUED : September 12, 2000 



Sep 12, 2000 



INVENTOR- INFORMATION : 
NAME 

Bloomfield; Marc Alan 
Krantz; Jeffrey Isaac 



CITY 

Lighthouse Point 
Boca Raton 



STATE ZIP CODE 

FL 

FL 



COUNTRY 



US-CL-CURRENT: 382 / 233 ; 341/55, 382 /244, 709 /247 



Reference 



□ 4. Document ID: US 6081623 A 

L16: Entry 4 of 5 



File: USPT 



Jun 27, 2000 



US-PAT-NO: 6081623 

DOCUMENT-IDENTIFIER: US 6081623 A 

** See image for Certificate of Correction ** 

TITLE: Method for lossless bandwidth compression of a series of glyphs 
DATE-ISSUED: June 27, 2000 



INVENTOR-INFORMATION: 
NAME 

Bloomfield; Marc Alan 
Krantz; Jeffrey Isaac 



CITY STATE 
Lighthouse Point FL 
Boca Raton FL 



ZIP CODE COUNTRY 



US-CL-CURRENT: 382 / 239 ; 341/106, 341/51, 358 / 426. 13 



□ 5. Document ID: US 5977889 A 
L16: Entry 5 of 5 



File: USPT 



Nov 2, 1999 



http://westbrs:9000ftin/gate.exe?f=^ 5/20/05 
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US- PAT-NO: 597788 9 

DOCUMENT- IDENTIFIER : US 5977889 A 

TITLE: Optimization of data representations for transmission of storage using 
differences from reference data 

DATE-ISSUED: November 2, 1999 



INVENTOR- INFORMATION : 
NAME 

Cohen; Norman Howard 



CITY 

Spring Valley 



STATE ZIP CODE 
NY 



COUNTRY 



US-CL-CURRENT: 341/55; 341/87 



Reference 



Term 


Documents 


INDEX$ 


0 


INDEX 


1483475 


INDEXA 


69 


INDEXAALE 


1 


INDEXAB 


9 


1NDEXABE 


1 


INDEXABI 


1 


INDEX ABIC 


2 


INDEXABLE 


15 


INDEXAB ELE 


1 


INDEXAB ILIT 


1 


(L15 AND 

ENDEXS ).PGPB,USPT,USOC,EPAB,JPAB,DWPI,TDBD. 


5 



There are more results than shown above. Click here to view the entire set- 



Display Format : [- I filiiif 



Previous Page Next Page Go to Doc# 
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Hit List 




Search Results - Record(s) 1 through 8 of 8 returned. 



□ 1. Document ID: US 20020136458 Al 



L3: Entry 1 of 8 



File: PGPB 



Sep 26, 2002 



PGPUB- DOCUMENT -NUMBER : 20020136458 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20020136458 Al 

TITLE: Method and apparatus for character string search in image 
PUBLICATION-DATE: September 26, 2002 
INVENTOR-INFORMATION : 

NAME CITY STATE COUNTRY RULE- 4 7 

Nagasaka, Akio Kodaira JP 

Miyatake, Takafumi Hachioji JP 

US-CL-CURRENT: 382/209 



US- PAT-NO: 6785677 

DOCUMENT-IDENTIFIER: US 6785677 Bl 

TITLE: Method for execution of query to search strings of characters that match 
pattern with a target string utilizing bit vector 

DATE-ISSUED: August 31, 2004 

INVENTOR- INFORMATION: 

NAME CITY STATE ZIP CODE COUNTRY 

Fritchman; Barry Lynn Lake Forest CA 

US-CL-CURRENT: 707/6; 707/3, 707/5, 707/7 




□ 2. Document ID: US 6785677 Bl 

L3: Entry 2 of 8 



File: USPT 



Aug 31, 2004 




http://westbrs:9000^in/gate.exe?f=TOC&state=41321f.4&ref=3&dbname=PGPB,USPT,USO. 
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□ 3. Document ID: US 5841376 A 

L3: Entry 3 of 8 



File: USPT 



Nov 24, 1998 



US -PAT-NO: 5841376 

DOCUMENT- IDENTIFIER : US 5841376 A 

TITLE: Data compression and decompression scheme using a search tree in which each 
entry is stored with an infinite-length character string 

DATE- ISSUED : November 24, 1998 



INVENTOR- INFORMATION: 
NAME 

Hayashi; Takaaki 



CITY 

Yokohama 



STATE 



ZIP CODE 



COUNTRY 
JP 



US-CL-CURRENT: 341/51; 341/50 



□ 4. Document ID: US 5377349 A 

L3: Entry 4 of 8 



File: USPT 



Dec 27, 1994 



US-PAT-NO: 5377349 

DOCUMENT- IDENTIFIER: US 5377349 A 

TITLE: String collating system for searching for character string of arbitrary 
length within a given distance from reference string 

DATE-ISSUED: December 27, 1994 



INVENTOR-INFORMATION : 
NAME 

Motomura; Masato 



CITY 
Tokyo 



STATE 



ZIP CODE 



COUNTRY 
JP 



US-CL-CURRENT: 707/7; 712/17 



□ 5. Document ID: US 4907194 A 

L3: Entry 5 of 8 File: USPT Mar 6, 1990 

•US-PAT-NO: 4907194 

DOCUMENT-IDENTIFIER: US 4907194 A 

** See image for Certificate of Correction ** 

TITLE: String comparator for searching for reference character string of arbitrary 
length 



http://westbrs:9000/bin/ga^^ 5/20/05 
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DATE-ISSUED: March 6, 1990 



INVENTOR- INFORMATION : 
NAME 

Yamada; Hachiro 
Takahashi; Kousuke 



CITY 

Tokyo 

Tokyo 



STATE 



ZIP CODE 



COUNTRY 

JP 

JP 



US-CL-CURRENT: 365 /49; 365 / 189, 07 , 365 / 189. 08 , 711 /217 



assification 



□ 6. Document ID: JP 2005062976 A 

L3: Entry 6 of 8 File: DWPI Mar 10, 2005 

DERWENT-ACC-NO : 2005-189521 
DERWENT-WEEK: 200520 

COPYRIGHT 2 005 DERWENT INFORMATION LTD 

TITLE: Electronic form for character input, sets reference character string 
searched from character list, when single character is input to input column, as 
input character string 

PRIORITY-DATA: 2003 JP-0207937 (August 19, 2003) 
PATENT- FAMILY: 

PUB-NO PUB-DATE LANGUAGE PAGES MAIN- 1 PC 

JP 2005062976 A March 10, 2005 006 G06F017/22 



INT-CL (IPC) : G06 F 17/21; G06 F 2/7/ 22; G06 F 19/00 



□ 7. Document ED: JP 2003323581 A 

L3: Entry 7 of 8 File: DWPI Nov 14, 2003 

DERWENT-ACC-NO : 2003-889113 
DERWENT-WEEK: 200382 

COPYRIGHT 2005 DERWENT INFORMATION LTD 

TITLE: Account registration system for personal business companies, searches 
account headings based on input character string, and searched headings are 
automatically grouped based on input account book data 



PRIORITY-DATA: 2002 JP-0130956 (May 2, 2002) 



PATENT- FAMILY: 

PUB-NO PUB-DATE LANGUAGE PAGES MAIN- I PC 

JP 2003323581 A November 14, 2003 019 G06F019/00 



INT-CL (IPC) : G06 F 17/60; GO 6 F 19/00 



http://westbrs:9000ftin/gate.exe?f=TO 5/20/05 
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□ 8. Document ID: US 5586288 A 

L3: Entry 8 of 8 



File: DWPI 



Dec 17, 1996 



DERWENT-ACC-NO: 1997-051530 
DERWENT-WEEK: 199705 

COPYRIGHT 20 05 DERWENT INFORMATION LTD 

TITLE: Memory interface chip with rapid search capability - has multiple registers 
to latch data from memory under control of microprocessor and to permit comparison 
of search strings and masking or special comparison characters provided by 
processor with data 

INVENTOR: DAHLBERG, B 

PRIORITY-DATA: 1993US-0125315 (September 22, 1993) 



PATENT- FAMILY: 
PUB-NO 

US 5586288 A 



PUB-DATE 

December 17, 1996 



LANGUAGE PAGE S MAI N- 1 PC 

036 G06F012/00 



INT-CL (IPC) : G06 F 7/20; GO 6 F 12/00; Gil C 15/00 



Ci3:?S!fiC:3tiOf) 



Term 


Documents 


COLUMNS 1 


0 


COLUMN 


937400 


COLUMNA 


237 


COLUMNB 


16 


COLUMNC 


18 


COLUMND 


7 


COLUMNE 


61 


COLUMNF 


19 


COLUMNG 


152 


COLUMNH 


3 


(L2 AND 

COLUMNS 1 ).PGPB,USPT,USOC,EPAB,JPAB,DWPI,TDBD. 


8 



There are more results than shown above. Click here to view the entire set. 
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Hit List 



Search Results - Record(s) 1 through 7 of 7 returned. 



□ 1. Document ID: US 6834283 Bl 

L4: Entry 1 of 7 



File: USPT 



Dec 21, 2004 



US-PAT-NO: 6834283 

DOCUMENT -IDENTIFIER : US 6834283 Bl 

TITLE: Data compression/decompression apparatus using additional code and method 
thereof 

DATE-ISSUED: December 21, 2004 



INVENTOR- IN FORMAT I ON: 

NAME CITY 

Satoh; Noriko Kanagawa 



STATE 



ZIP CODE 



COUNTRY 
JP 



US-CL-CURRENT: 707 / 101 ; 704 /3, 704 /7 



Class invito n 



H 2. Document ID: US 6470347 Bl 

L4: Entry 2 of 7 File: USPT Oct 22, 2002 

US-PAT-NO: 6470347 

DOCUMENT-IDENTIFIER: US 6470347 Bl 

TITLE: Method, system, program, and data structure for a dense array storing 
character strings 

DATE-ISSUED: October 22, 2002 

INVENTOR-INFORMATION: 
NAME 

Gillam; Richard Theodore 



CITY STATE ZIP CODE COUNTRY 

San Jose CA 



US-CL-CURRENT: 707/101; 704 /10, 707/10, 707 / 104.1 



Classification D*3t* Refeience 



□ 3. Document ID: US 6301394 Bl 
http://westbrs:9000/bin/gate.exe?f=TOC&state=41321f.5&ref=4&dbname=PGPB,USPT,USO. 



5/20/05 



Record List Display 



Page 2 of 4 



L4: Entry 3 of 7 



File: USPT 



Oct 9, 2001 



US-PAT-NO: 6301394 

DOCUMENT-IDENTIFIER: US 6301394 Bl 

TITLE: Method and apparatus for compressing data 

DATE-ISSUED: October 9, 2001 



INVENTOR- INFORMATION : 
NAME 

Trout; H. Robert G. 



CITY 

San Diego 



STATE 
CA 



US-CL-CURRENT: 382/244; 341/51, 341/87, 382/245 



Titie ! CiUlior* j Ffont | Revtew ! Classification ]~ Date f Refefenoe 



ZIP CODE 



COUNTRY 



File: USPT 



Jun 12, 2001 



□ 4. Document ID: US 6247015 Bl 

L4: Entry 4 of 7 
US-PAT-NO: 6247015 

DOCUMENT-IDENTIFIER: US 6247015 Bl 



TITLE: Method and system for compressing files utilizing a dictionary array 
DATE-ISSUED: June 12, 2001 



INVENTOR-INFORMATION : 
NAME 

Baumgartner; Jason Raymond 

Malik; Nadeem 

Roberts; Steven Leonard 



CITY 
Austin 
Austin 
Austin 



STATE 
TX 
TX 
TX 



ZIP CODE 



COUNTRY 



US-CL-CURRENT: 707/101 



□ 5. Document ID: US 6195664 Bl 

L4: Entry 5 of 7 File: USPT Feb 27, 2001 

US-PAT-NO: 6195664 

DOCUMENT-IDENTIFIER: US 6195664 Bl 

TITLE: Method and system for controlling the conversion of a file from an input 
format to an output format 

DATE-ISSUED: February 27, 2001 



http://westbrs:9000ftin/gate.exe?f=TO 5/20/05 
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INVENTOR- INFORMATION: 
NAME 

Tolfa; Michael John 



CITY 

North Richland Hills 



STATE ZIP CODE COUNTRY 
TX 



US-CL-CURRENT: 707/200; 370/396, 707/10, 707/100, 707/101, 707/203, 709/222, 
713/324 



□ 6. Document ID: US 6121903 A 

L4: Entry 6 of 7 
US-PAT-NO: 6121903 

DOCUMENT-IDENTIFIER: US 6121903 A 
TITLE: On-the-fly data re-compression 
DATE-ISSUED: September 19, 2000 



INVENTOR- INFORMATION: 
NAME 

Kalkstein; Nir 



CITY 

Herzliya 



US-CL-CURRENT: 341/63; 341/106, 341/65 



File: USPT 



STATE 



ZIP CODE 



Sep 19, 2000 



COUNTRY 
IL 



□ 7. Document ID: US 5945933 A' 

L4: Entry 7 of 7 File: USPT 

US-PAT-NO: 5945933 

DOCUMENT-IDENTIFIER: US 5945933 A 

TITLE: Adaptive packet compression apparatus and method 
DATE-ISSUED: August 31, 1999 



Aug 31, 1999 



INVENTOR- INFORMATION: 
NAME 

Kalkstein; Nir 



CITY 

Herzliya 



STATE 



ZIP CODE 



COUNTRY 
IL 



US-CL-CURRENT: 341/63; 341 / 106 , 341 /65 
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Term 


Documents 


"5841376" 


8 


5841376S 


0 


"5841376".UREF..PGPB,USPT,USOC,EPAB,JPAB,DWPI,TDBD. 


7 


(5841376 .UREF.) PGPB,USPT,USOC,EPAB,JPAB,DWPI,TDBD. 


7 
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Your wildcard search against 10000 terms has yielded the results below. 
Your result set for the last L# is incomplete. 
The probable cause is use of unlimited truncation. Revise your search strategy to use limited truncation. 



Search Results - Record(s) 1 through 2 of 2 returned. 



□ 1. Document ID: US 6785677 Bl 

L7: Entry 1 of 2 



File: USPT 



Aug 31, 2004 



US-PAT-NO: 6785677 

DOCUMENT -IDENTIFIER : US 6785677 Bl 

TITLE: Method for execution of query to search strings of characters that match 
pattern with a target string utilizing bit vector 

DATE-ISSUED: August 31, 2004 



INVENTOR-INFORMATION: 
NAME 

Fritchman; • Barry Lynn 



CITY 

Lake Forest 



STATE ZIP CODE 
CA 



COUNTRY 



US-CL-CURRENT: 707 /6; 707 /3, 707/5, 707/7 



Classification 



□ 2. Document ID: US 5377349 A 

L7: Entry 2 of 2 



File: USPT 



Dec 27, 1994 



US- PAT-NO : 5377349 

DOCUMENT-IDENTIFIER: US 5377349 A 

TITLE: String collating system for searching for character string of arbitrary 
length within a given distance from reference string 

DATE-ISSUED: December 27, 1994 



INVENTOR- INFORMATION: 
NAME 

Motomura; Masato 



CITY 
Tokyo 



STATE 



ZIP CODE 



COUNTRY 
JP. 



US-CL-CURRENT: 707/7; 



http://westbrs:9000Mn7gate.exe?f=TO 5/20/05 



Record List Display 



Page 1 of 4 



Hit List 



Search Results - Record(s) 1 through 8 of 8 returned. 



□ 1 . Document ID: US 6643647 B2 

L8: Entry 1 of 8 



File: USPT 



Nov 4, 2003 



US-PAT-NO: 6643647 

DOCUMENT-IDENTIFIER: US 6643647 B2 

TITLE: Word string collating apparatus, word string collating method and address 
recognition apparatus 

DATE-ISSUED: November 4, 2003 



INVENTOR- INFORMATION : 
NAME 

Natori; Naotake 



CITY 

Kawasaki 



STATE 



ZIP CODE 



COUNTRY 
JP 



US-CL-CURRENT: 707/6; 704 / 254 , 707/202, 707 /7 , 715/ 536 



□ 2. Document ID: US 6512851 B2 

L8: Entry 2 of 8 



File: USPT 



Jan 28, 2003 



US-PAT-NO: 6512851 

DOCUMENT-IDENTIFIER: US 6512851 B2 

TITLE: Word recognition device and method 

DATE-ISSUED: January 28, 2003 



I NVENTOR- I N FORMAT I ON : 
NAME 

Navoni; Loris 
Canegallo; Roberto 
Chinosi; Mauro 
Gozzini; Giovanni 
Kramer; Alan 
Rolandi; Pierluigi 



CITY 

Cernusco Sul Naviglio 
Tortona 

Cologno Monzese 
Palazzolo Sull f Oglio 
Berkeley 
Volpedo 



STATE ZIP CODE 



CA 



COUNTRY 

IT 

IT 

IT 

IT 

IT 



US-CL-CURRENT: 382/22S; 711/108 



http://westbre:9000toin/ga^ 5/20/05 
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□ 3. Document ID: US 6442295 B2 

L8: Entry 3 of 8 



File: USPT 



Aug 27, 2002 



US -PAT-NO : 6442295 

DOCUMENT- IDENTIFIER : US 6442295 B2 

TITLE: Word recognition device and method 

DATE-ISSUED: August 27, 2002 



INVENTOR- IN FORMAT I ON: 
NAME 

Navoni; Loris 
Canegallo; Roberto 
Chinosi; Mauro 
Gozzini; Giovanni 
Kramer; Alan 
Rolandi; Pierluigi 



CITY 

Cernusco sul Naviglio 
Tortona 

Cologno Monzese 
Palazzolo Sull'Oglio 
Berkeley 
Volpedo 



US-CL-CURRENT: 382/229; 707/6 



STATE ZIP CODE 



CA 



;*!;•! i Classification 



COUNTRY 

IT 

IT 

IT 

IT 

IT 



□ 4. Document ED: US 6332195 Bl 

L8: Entry 4 of 8 



File: USPT 



US-PAT-NO: 6332195 

DOCUMENT-IDENTIFIER: US 6332195 Bl 

TITLE: Secure server utilizing separate protocol stacks 
DATE-ISSUED: December 18, 2001 



INVENTOR-INFORMATION: 
NAME 

Green; Michael W. 
Jensen; Andrew W. 



CITY 

Shoreview 
Oakdale 



STATE 

MN 

MN 



ZIP CODE 



Dec 18, 2001 



COUNTRY 



US-CL-CURRENT: 713/201; 709/201 



□ 5. Document ID: US 6144934 A 

L8: Entry 5 of 8 



File: USPT 



Nov 7, 2000 



http://westbrs:9000Mn/gate.exe?f=TO 5/20/05 
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US -PAT-NO : 6144934 

DOCUMENT- IDENTIFIER : US 6144934 A 

TITLE: Binary filter using pattern recognition 

DATE-ISSUED: November 7, 2000 



INVENTOR- INFORMATION : 
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INVENTOR- IN FORMAT I ON : 
NAME 

Hayashi; Takaaki 



CITY 

Yokohama 



STATE 



ZIP CODE 



COUNTRY 
JP 



US-CL-CURRENT: 341/51; 341/50 



Classification 



□ 11. DocumentID: US 5337233 A 

Lll: Entry 11 of 19 File: USPT Aug 9, 1994 

US-PAT-NO: 5337233 

DOCUMENT-IDENTIFIER: US 5337233 A 

TITLE: Method and apparatus for mapping multiple-byte characters to unique strings 
of ASCII characters for use in text retrieval 

DATE-ISSUED: August 9, 1994 

INVENTOR-INFORMATION: 

NAME CITY 

Hofert; David K. Hudson 

Yoshida; Yutaka Tokyo 



US-CL-CURRENT: 715 / 540 ; 715/531 




□ 12. DocumentID: US 5265242 A 
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DOCUMENT-IDENTIFIER: US 5265242 A 

TITLE: Document retrieval system for displaying document image data with inputted 
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bibliographic items and character string selected from multiple character 
candidates 

DATE-ISSUED: November 23, 1993 



INVENTOR-INFORMATION: 
NAME 

Fujisawa; Hiromichi 
Hatakeyama; Atsushi 
Nakano; Yasuaki 
Higashino; Junichi 
Hananoi; Toshihiro 



CITY 

Tokorozawa-shi, Saitama 
Kokubunji-shi, Tokyo 
Hino-shi, Tokyo 
Kogamei-shi, Tokyo 
Naka-gun, Kanagawa 



STATE ZIP CODE 



COUNTRY 

JP 

JP 

JP- 

JP 

JP 



US-CL-CURRENT: 707/3; 382/231, 382/305, 707 / 104.1 , 715/530 
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File: DWPI 



Mar 10, 2005 



DERWENT-ACC-NO: 2 005-18 9646 
DERWENT-WEE K : 200520 

COPYRIGHT 2005 DERWENT INFORMATION LTD 

TITLE: Performance management support system judges whether input search question 
sentence contains important word stored in specific index, and uses character 
string which matches partially or fully as search key 

PRIORITY-DATA: 2 0 03 JP-02 94851 (August 19, 2003) 



PATENT- FAMILY: 
PUB-NO 

JP 2005063284 A 



PUB-DATE 
March 10, 2005 



LANGUAGE 



PAGES 
034 



MAIN-IPC 
G06F017/30 



INT-CL (IPC) : G06 F 12/00; G06 F 17/30 



Classification 
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Lll: Entry 14 of 19 File: DWPI Apr 22, 2004 

DERWENT-ACC-NO: 2004-360 651 
DERWENT -WEEK: 2 00434 

COPYRIGHT 2005 DERWENT INFORMATION LTD 

TITLE: Moving image search method involves searching character string which 
corresponds to search key word, based on moving image search requirement from 
client terminal 
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PATENT- FAMILY: 
PUB-NO 

JP 2004128710 A 



PUB- DATE 
April 22, 2004 



LANGUAGE 



PAGES 
019 



MAIN-IPC 
H04N005/76 



INT-CL (IPC) : G06 F 17/30; H04 N 5/76; H04 N 5/765 
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P 15. Document ID: JP 2004128692 A 

Lll: Entry, 15 of 19 



File: DWPI 



Apr 22, 2004 



DERWENT-ACC-NO: 2004-360649 
DERWENT-WEEK: 200434 

COPYRIGHT 2005 DERWENT INFORMATION LTD 

TITLE: Search index assigning method for moving image, involves matching selected 
character strings and identification information of divided moving images, based on 
which selected strings are assigned as search index for respective images 

PRIORITY-DATA: 2002 JP-02 87240 (September 30, 2002) 



PATENT- FAMILY: 
PUB-NO 

JP 2004128692 A 



PUB-DATE 
April 22, 2004 



LANGUAGE PAGES MAIN-IPC 

030 H04N005/765 



INT-CL (IPC): G06 F 17/30; H04 N 5/76; H04 N 5/765; H04 N 5/91 
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□ 16. Document ID: JP 2004021746 A 

Lll: Entry 16 of 19 



File: DWPI 



Jan 22, 2004 



DERWENT-ACC-NO: 2004-137821 
DERWENT-WEEK: 200414 

COPYRIGHT 2005 DERWENT INFORMATION LTD 

TITLE: Character string retrieval result display method in electronic document 
search system, extracts positional data about extracted keywords using position and 
index files so as to generate highlighted data 

PRIORITY-DATA: 2002 JP-0177711 (June 18, 2002) 



PATENT- FAMILY: 
PUB -NO 

JP 2004021746 A 



PUB-DATE 

January 22, 2004 



LANGUAGE PAGES MAIN-IPC 

016 G06F017/30 



INT-CL (IPC) : G06 F 17/21; G06 F 17/30 
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□ 17. Document ID: JP 2004094916 A, US 20040006460 Al 

Lll: Entry 17 of 19 File: DWPI Mar 25, 2004 

DERWENT-ACC-NO: 2004-167 553 
DERWENT-WEEK: 2 00422 

COPYRIGHT 2005 DERWENT INFORMATION LTD 

TITLE: Problem solving support system for users of electronic apparatus e.g. 
portable telephone, extracts index word character strings that correspond to 
element word, if input word is neither kana nor Roman alphabetic 

INVENTOR: KATAYAMA, Y; NARUMI , T 

PRIORITY-DATA: 2003 JP-0126068 (April 30, 2003), 2002 JP-0198204 (July 8, 2002) 
PATENT- FAMILY: 

PUB-NO PUB-DATE LANGUAGE PAGES MAIN-IPG 

JP 2004094916 A March 25, 2004 034 G06F017/30 

US 20040006460 Al January 8, 2004 044 G06F017/21 



INT-CL (IPC) : G06 F 17/21; G06 F 17/30; H04 M 11/00 



□ 18. Document ID: JP 09212523 A 

Lll: Entry 18 of 19 



File: DWPI 



Aug 15, 1997 



DERWENT-ACC-NO: 19 97-4 62392 
DERWENT-WEEK: 199743 

COPYRIGHT 2005 DERWENT INFORMATION LTD 

TITLE: Sentence search method for CD-ROM publication, E-mail using computer, word 
processor - involves searching character string by matching each character in 
search character string with character string in document and also matching their 
utilisation type 

PRIORITY-DATA: 1996 JP-0 035500 (January 30, 1996) 
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PUB-NO 

JP 09212523 A 



PUB-DATE 
August 15, 1997 



LANGUAGE PAGES MAIN- I PC 

013 G06F017/30 



INT-CL (IPC) : G06 F 17/30 
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DERWENT-ACC-NO: 1992-050835 
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DERWENT-WEEK : 1998 03 

COPYRIGHT 2 005 DERWENT INFORMATION LTD 
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reference numbers assigned to one character string and reader to detect matching of 
input characters 

INVENTOR: CHIBA, H; NAKANO, Y ; OKADA, Y ; YOSHIDA, S 

PRIORITY-DATA: 1 990 JP-02 1 12 95 (August 8, 1990), 1990 JP-0209566 (August 6, 1990) 
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PUB-DATE 


LANGUAGE 
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MAIN-IPC 


EP 470798 A 
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027 




DE 69128053 E 


December 4, 1997 




000 


G06F017/30 


JP 04095161 A 


March 27, 1992 




012 




US 5136289 A 


August 4, 1992 




022 


H03M007/40 


EP 470798 A3 


April 7, 1993 




027 




EP 470798 Bl 


October 29, 1997 
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028 


G06F017/30 
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11 String search neuron for artificial neural networks 

Jnverttor: SPECHT DONALD F (US); PAILLET GUY (US) Applicants 

£C: XPa G06G7/00; G06F15/18; (+3) 

Publication info; US2004 122783 - 2004-06-24 

12 BLINKING ANNOTATION CALLOUTS HIGHLIGHTING CROSS 
LANGUAGE SEARCH RESULTS 

Inventor; CHAN NING-PING (US) Applicant: CHAN NING-PING (US) 

£C: G06F17/24A; G06F17/30W1 IPC; G06F17/30; G06F17/28 

PubJication mfo; WO2004042615 - 2004-05-21 

13 Methods and systems for search indexing 

inventor; GROSS WILLIAM (US); COLWELL STEVEN LEEAppltcant: 
(US) 

£C; IPCi G06F17/30 

Publication info; US2004133564 - 2004-07-08 

14 Automatic search method 

Inventor: LEITERMANN THOMAS (DE) Appftcant: 

SCs G06F17/30H2; G06F17/30H4 IPC; G06F7/00 

Publication info; US2004030692 - 2004-02-12 

15 Multi-language document search and retrieval system 

Inventor: LOOFBOURROW WAYNE (US); CASSERES Applicant; 
DAVID (US) 

EC; G06F17/27; G06F17/30H2 IPC: G06F17/28 

Publication info; US2004006456 - 2004-01-08 

16 FULL REGULAR EXPRESSION SEARCH OF NETWORK TRAFFIC 

£nv*ntor; MATHUR ALOK; BEYLIN BORIS Applicant; INTEL CORP (US) 

EC! G06F7/02; G06F17/30G3; (+1) IPC: G06F7/00 

Publication Utiox WO03107173 - 2003-12-24 

I? Method for accessing a storage unit during the search for substrings, 
and a corresponding storage unit 

Inventor; RABAIOLI GIOVANNI (US) Applicant: 

£C: G06F17/30P2F IPC; G06F17/30 

PuMscation infw US2004015498 - 2004-01-22 

18 SEARCH DEVICE 

Inventors UCHIYAMA MASAO; ISAHARA HITOSHI Appltranli NAT INST OF INFORMATION & COMM 

ECc IPC; G06F17/30 

Publication info: JP2004280259 - 2004-10-07 

IS SEARCH CHARACTER INPUT DEVICE, METHOD, STORAGE MEDIUM, 
AND WWW BROWSER HAVING SEARCH CHARACTER INPUT WINDOW 

Xnventor: OKABE TOSHIHIKO; OKABE CHISATO Applicant: OKABE TOSHIHIKO; OKABE CHISATO 

eiC; IPC; G06F17/30; G06F3/00 

Pubtication mfo; JP2004259216 - 2004-09-16 

20 DATABASE SEARCH SYSTEM AND SEARCH METHOD, METHOD FOR 
FORMING DATA FILE USED FOR SEARCH, AND STORAGE MEDIUM 
STORING DATA FILE 

Inventor: TERUI FUMIHIKO; NAKAMURA TOSHIYUKI Applicant: IBM 
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Applicant; HYUPWOO TRADING CO LTD (KR) 
IPCs A62B18/00 



Applicant: CRESSI SUB SPA 
IPC; A63B33/00; B63C11/12 



MASK WITH ADJUSTABLE STRING 

Inventor: KIM SANG SIK (KR) 
Publication info; KR188193Y - 2000-07-15 

DIVING MASK WITH INCLINED GLASS 

Inventor: GODOY CARLOS ALBERTO 
EC; B63C11/12 

Publication infcv; JP2000202064 - 2000-07-25 

MASK HOLDING MEMBER AND COMPOSITE MASK USING THE SAME 

Inventor; ITO HARUHIKO; ENOMOTO SATOSHI AppKcsnt; JAPAN VILENE CO LTD 

i-C: IPC; A62B18/08 

Publication info; JP2001104501 - 2001-04-17 

MASK WITH POCKET FOR FIXING MOUTH CLOTH TO BACK FACE 

Inventor: HASE AKIKO Applicant: HASE AKIKO 

EC: IPC 3 . A62B18/02 

Publication info: JP200O288107 - 2000-10-17 

PHOTOCATALYTIC SANITARY MASK 

inventor: SATO YASUO 

EC; 

Publication infos JP2000202052 - 2000-07-25 
MASK AND MASK COVER 

inventor: TAKESHI MA KATSUMARO; INOUE RYUSUKE; AppKcant; TM ADOTEKKU KK; KITAMURA SHOKO KK; 

(+1) (+1) 

ECi IPC; A62B18/02 

Publication info; JP2000197711 - 2000-07-18 

Two mask method for reducing field oxide encroachment in memory 
arrays 

Inventor: HSIEH CHIA-TA (TW); KUO DI-SON (TW); AppStcsnts TAIWAN SEMICONDUCTOR MFG (TW) 
(+2) 

£C; H01L21/762B; H01L21/8242 XPCx H01L21/8247; H01L21/8242; (+1) 

Publication info; US5976927 - 1999-11-02 

COMMON SPECTACLE MOUNTING FULL FACE MASK 

Inventor: HARUKAWA JUNICHI; YOKOTA YOSHIHIRO; Applicant; JAPAN TECH RES & DEV INST 
(+2) 



AppHcant: EKUSERU LIGHT KK 
IPC: A62B18/02; B01J35/02 



Publication info; JP11267235 - 1999-03-24 

Mask protective device 

Inventor: NAKAGAWA HIROAKI (JP); KONDO 
MASAHIRO (JP) 
i-C: G03F1/14D 

Publication info; US6083577 - 2000-07-04 

MASK WITH ADHESIVE 

Inventor: HOSODA MINORU 
£C; A41D13/11C2B 

Publication info; JP10248948 - 1998-09-22 



IPCx A62B18/08; A62B18/02 

Applicant: MITSUI CHEMICALS INC (JP) 
IPd G03F1/14 



Applicant; HOSODA MINORU 
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Applicant: SIERRA WIRELESS INC A CANADIAN 
XPa G06F17/00 



Applicants ASK JEEVES INC (US); RAINEY JIM E (US) 
3PC; H04L 



Applicant: 

IPC} G06F12/00 



Incremental search of keyword strings 

Inventor: TOSEY JOSEPH PETER R (CA) 

£C; 

Pufo^cation mio: US2005086234 - 2005-04-21 

ADVERTISING BASED ON A SEARCH STRING AND USER ATTRIBUTE 
COMBINATION 

Inventor; RAINEY JIM E (US) 

£C; 

Pubiacatson info: WO2005029745 - 2005-03-31 

Reverse search system and method 

inventor: SCHACHAM ALON (IL); MATZA AVI (IL); 
(+1) 

£C; G06F17/30P2S9; G11C15/00 
PubJitatson info; US2005050260 - 2005-03-03 

Search system, search program, and personal computer 

inventor: TANAKA TAKASHIGE (JP); KASAI TSUNEO Applicant: SEIKO EPSON CORP (JP) 
(JP); (+1) 

£Ci IPC G06F17/30 

Publication info; EP1510948 - 2005-03-02 

Method using the digit string grouped by several specified phone 
numbers sequentially to search specific e-mail address 

Knventon LIN JUNG-YU (TW) Applicant; LIN JUNG-YU (TW) 

£C; IPC; G06F17/30 

Publication info; TW594505 - 2004-06-21 

PDA with dictionary search and repeated voice reading function 

Inventor: LAI CHENG-SHING (TW); TU XUWEI (CN) AppKrants INVENTEC APPLIANCES CORP (TW) 
£C; XPCi G06F3/00 

Publication info; TW591486 - 2004-06-11 

Systems and methods that employ a distributional analysis on a query 
log to improve search results 

inventor; BRILL ERIC D (US); CARMICHAEL PHILIP F Applicant: 
(US) 

1LC: JPC; G06F7/00 

Publication info; US2004254920 - 2004-12-16 

TELEVISION RECEIVING APPARATUS AND ITS PROGRAM 
INFORMATION SEARCH METHOD 

Snventor: KONISHI MASAHITO 



Applicant; MATSUSHITA ELECTRIC IND CO LTD 
IPC: H04N5/445; G06F17/30; (+1) 



Publication Infos JP2004312627 - 2004-11-04 

Edit distance string search 

Inventor: BAX ERIC THEODORE (US); SWETT IAN AppJicant; 
DOUGLAS (US) 

ZCi G06F7/02 IPC: G06F7/00 

Publication infos US2004220920 - 2004-11-04 

Service search device and method, and client device using service 
search device 

Inventor; YOSHIMURA KOICHI (JP); OKUYAMA JUNICHI AppJicsnfc: FUJI XEROX CO LTD (JP) 
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